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Function Analysis of Thirty-Two American Corporate Boards * 


Jerome G. Kunnath and Willard A. Kerr 


Illinois Institute of Technology 


The insight of the general public and even 
of industrial psychologists into the typical 
activities of corporate boards of directors is 
probably somewhat vague and _ inaccurate. 
Since the corporate board is an important 
policy-making nerve center in industrial so- 
ciety, it needs to be brought within the orbit 
of psychological research. 

This study, profiting from the activity 
analyses reported by Flanagan on laboratory 
personnel (1), Gordon on airline pilots (2), 
and Wagner on dentists (3), is, however, 
focussed on group rather than individual be- 
havior. What does the corporate board do 
at its meetings? What are some of the prob- 
able determinants of what it does? It is the 
purpose of this study to investigate some of 
the behaviors of the corporate board. 


Experimental Design 


Invitations to participate in a nation-wide 
study of corporate board activities were sent 
to one board member of each of 246 cor- 
porations. These 246 names were selected 
at random from the “Corporation Direc- 
tory” section of Poor’s Register of Directors 
and Executives, 1950. <A total of 32 firms 
actually participated. In each instance a 
member of the firm’s board of directors com- 
pleted an “Industrial Board Member Survey” 
chart which listed 21 topics of board activity 
under the following heading: According to 
my business board experience, boards con- 
sider at how many meetings per year? The 
board member then indicated the number, out 
of twelve, of meetings per year at which each 
topic is considered. 

* Work completed in the student research program 


of the Industrial Psychology Laboratory of the Illi- 
nois Institute of Technology. 


In size, the corporations sampled ranged 
from 50 to 25,000 personnel, the median 
being 250. Sizes of boards ranged from 2 
to 23 members, 5 being the median. Mean 
ages of members of the 32 boards ranged 
from 49 to 73, the median being 58.7. The 
average number of other corporate boards 
to which the average board member in these 
firms belongs ranges from 0 to 14, the median 
being 2.6. The per cent of board members 
who also work in the operating management 
of a firm ranges from 13.3 to 100.0, the 
median being 66.6. Fourteen of the firms 
studied were in metropolitan areas (500,000 
population or more), and 18 were in tess 
populated localities. Geographic distribution 
of the 32 firms was closely representative of 
the national distribution of American indus- 
try. When classified according to kind of 
industry, the breakdown of firms is as fol- 
lows: heavy, 7; heavy-light manufacturing 
and transportation, 12; light manufacturing, 
8; commercial (retailing, utilities), 3; fi- 
nance, 2. 

The median frequency of topic considera- 
tion was computed for the 32 corporations on 
each of the 21 topics. Seven hypotheses were 
formulated as relevant to explanation of 
topic behavior variance. Objective data were 
obtained in order to make at least crude 
tests of these seven hypotheses. These hy- 
potheses pertain to metropolitan versus non- 
metropolitan locus of firm, number of per- 
sonnel in the firm, number of members on. 
corporate board, per cent of board member 
overlap with operating management, kind of 
industry (most to least heavy), mean age 
of board members, and extent of service of 
average board member on other boards. 
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Results 


Activity profile. As indicated in Figure 1, 
the typical corporate board in this study 
gives relatively frequent attention through 
board meetings to: future business prospects 
(4.3. sessions per year) competition (3.9 
sessions); quantity of output (3.8); dis- 
tribution (3.0); and, the business cycle 
(2.8). Relatively infrequent topics of board 
attention include: voting bonuses (1.1 ses- 
sions per year) obtaining capital (1.3); rela- 
tions with government (1.4); company morale 
(1.4); advertising (1.4); salesmanship (1.5); 
evaluation of key personnel (1.5); public re- 
lations (1.5); and salaries and wages (1.7). 
Intermediate amounts of attention are given 
to: labor relations (2.5); taxes (2.5); pric- 
ing (2.2); stock inventory (2.0); quality of 
output (2.0); relations with stockholders 
(2.0); and distribution of profits (2.0). 

Related hypotheses. The seven hypotheses 
of meaningful relation to board behavior as 
previously stated find some confirmation in 
Table 1. Metropolitan location of a firm is 
associated with certain board behaviors, par- 


ticularly treatment of relations with stock- 
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Fic. 1. Median number of corporate board meet- 
ings per year at which each of twenty-one topics is 
considered, according to the reports of directors of 
thirty-two corporations. 


holders, voting bonuses, distribution of pro- 
fits, morale, and quantity of output. 

Size of the organization in number of per- 
sonnel is also associated with board behavior. 
Boards of larger firms give greater attention 
to future business prospects, taxes, output, 
distribution, pricing, evaluation of key per- 
sonnel, stock inventory, and distribution of 
profits. 

Size of the board itself is significantly re- 
lated with board emphasis on such topics as 
the business cycle, distribution of profits, and 
advertising. 

Extent of board personnel overlap with 
operating management personnel is associ- 
ated with assignment of little attention to 
advertising and pricing. 

The heavier the industry, the less atten- 
tion does the board tend to devote to voting 
bonuses and to quantity of output. An in- 
teresting tendency also exists for the heavy 
industry boards to give more frequent atten- 
tion to labor relations. 

Mean age of board members per board is 
unrelated to the board behaviors investigated 
in this study. 

The extent to which the board is com- 
posed of members with memberships on 
other boards is associated with greater board 
emphasis upon distribution, quantity of out- 
put, quality of output, relations with stock- 
holders, distribution of profits, advertising, 
the business cycle, taxes, competition, and 
company morale. 


Conclusions 


Insofar as these data are valid estimates 
of activity emphases of corporate boards, the 
following conclusions may be warranted: 

1. The frequent topics of board attention 
are future business prospects, competition, 
quantity of output, distribution, and the 
business cycle. 

2. Moderate board attention is given to 
labor relations, taxes, pricing, inventory, 
quality, stockholder relations, and distribu- 
tion of profits. 

3. Relatively infrequent attention is as- 
signed to salaries and wages, public relations, 
evaluation of key personnel, salesmanship, 
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Table 1 


Tetrachoric Coefficients* of Correlation Between Corporate Board Emphases on 
Certain Topics and Seven Referrent Variables 


2. 


No. of 
Per- 
sonnel 


Metro 
Topic politan 
. Pricing 31 
. Stock inventory 32 
3. Output (Quality) 
. Output (Quantity) AS 
. Distribution 34 
. Salesmanship 11 
. Advertising 
. Competition 05 
09 
. Business cycle .29 46 
. Labor relations Al 14 
34 
. Relations with government 16 39 
. Relations with stockholders 8 AO 
. Taxes 16 63 
. Evaluation of key personnel .O8 58 
. Salaries and wages 23 .28 
. Company morale 50 .29 
. Obtaining capital 07 A2 
Distribution of profits 54 52 
. Voting bonuses 67 25 


. Future business prospects 


. Public relations Al 


3. 4. 
Board 
Overlap 
with 
Mgt. 


6. a 
Mean Responsi 
Age of bilities 
Board to Other 
Members Boards 


No. on 
Board 


Heavy 
Industry 


01 49 17 00 
— 20 O4 OS 10 
00 .24 40 10 
19 — 35 60 21 
10 — 43 20 31 
18 43 01 19 
47 82 29 21 
Al 25 06 O8 
AO 34 02 Ol 
60 40 05 07 
AS 43 22 

10 

09 —, Ad O8 
18 35 01 
09 _, 13 10 
AO 25 26 
- O01 .28 10 
— .28 - 34 18 33 
1 16 25 17 
Bf 44 44 OS 
06 07 09 23 


* All coefficients for which the probability of non-chance meaning is 95 or better are indicated in italics. 


advertising, morale, government relations, 
obtaining capital, and voting bonuses. 

4. In general the topics given most fre- 
quent attention by boards are those related 
to immediate corporate survival, while those 
less frequently treated topics tend to be re- 
lated either to the internal workings of the 
company or to special staff or usually dele- 
gated functions. 

5. Such mentally stimulative factors as 
metropolitan environment, large number of 
personnel, large board, and particularly many 
board members who serve simultaneously on 
other boards are associated with more fre- 
quent consideration of practically all of the 
21 topics. 

6. Board overlap with operating manage- 
ment tends to be inversely related with fre- 
quency of consideration of the various topics. 
The only notable exception (significant at 


non-chance probability of 90) to this gen- 
eralization is frequency of attention to labor 
relations, which is considered more frequently 
in the “overlap” boards. This latter tendency 
may be due in part to defensive attitudes 
(defensive of management) of board mem- 
bers who also are a part of operating man- 
agement. Insofar as this restriction of prob- 
lem consideration is a result of board-man- 
agement overlap, it may be a psycho-economic 
argument against allowing board members 
also to serve in operating management. 
These data do suggest that such overlap, 
when excessive, may interfere with the prob- 
lem-raising and problem-solving processes in 
corporate enterprise. 

7. Average member “responsibilities on 
other boards” is a variable which probably 
connotes experience and exceptional ability. 
It seems significant that boards so favored 
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place notably greater board meeting emphasis 
on competition and quality of output. It 


also seems of importance that none of the 
other six “hypothesis” variables correlates 
significantly with board emphasis on either 
competition or quality of output. 

8. Mean age of board members was not a 
significant predictor of topics emphasized at 
board meetings. 


Received June 20, 1952. 


and Willard A. Kerr 
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The Curve of Output as a Criterion of Boredom * 


Patricia Cain Smith 


Cornell University 


The purpose of this study was to investi- 
gate the relationship between the experience 
of boredom and changes in rate of output 
or shape of production curves for industrial 
workers. The classic investigations of the 
British Industrial Fatigue Research Board 
(5, 6, 7, 8, 9) have satisfied the writers of our 
textbooks that the experience of monotony or 
boredom is characteristically accompanied by 
changes in the rate of output, and even that 
the nature of the worker’s experience may be 
identified by examination of the shape of the 
curve of output. A re-examination of the 
work of the British investigators was made 
necessary by certain deviations from normally 
acceptable methods of scientific investigation, 
which will be discussed later in this paper. 

As early as 1941, Roethlisberger and Dick- 
son failed to duplicate the English results. 
They stated: “With respect to the monotony 
hypothesis, no definite conclusion could be 
drawn. A curve resembling what is claimed 
to be a typical monotony curve was not en- 
countered except in the case of Operator 1A. 
It was clearly understood, however, that 
monotony in work is primarily a state of 
mind and cannot be assessed on the basis of 
output alone” (2, p. 127). 

In 1946, Rothe undertook an investigation 
of the characteristics of production data, 
recognizing their importance as criteria in a 
wide variety of industrial investigations. He 
found that individual daily work curves “may 
take any of many different forms and do not 
assume any characteristic, predictable pat- 
tern” (3, p. 209). Correlations of work 
curves for the same operators for different 
days varied widely, the median correlation 
being approximately .05. Rothe averaged 

* This paper is a portion of a dissertation presented 
as partial fulfillment of the requirements of the de- 
gree of doctor of philosophy at Cornell University. 
The writer is deeply indebted to Dr. T. A. Ryan for 
his guidance, and to the management, union officials, 


and workers whose active cooperation and assistance 
made the study possible. 
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work curves for each worker for one week, 
and obtained trend lines which were classified 
by inspection. Four of these curves were 
“mixed curves,” two were “fatigue curves” 
and two were “monotony curves.” 

Rothe was interested in determining 
whether knowledge of the production curve 
for any individual or group for a_ specific 
work period would permit prediction of the 
characteristics of future work curves. Neither 
he nor Roethlisberger and Dickson attempted 
to relate the shape of the work curves which 
they obtained to the experience of the in- 
dividual worker. Rothe’s study, moreover, 
was performed using hourly-paid workers 
whose work flowed in a continuous and un- 
interrupted manner, so that his results could 
not be directly applied to the very different 
incentive conditions obtaining for piece-rate 
workers whose work is grouped into lots or 
bundles. 

Since the existence of any convenient overt 
indicator of the psychological state of the 
worker would be of obvious practical im- 
portance, and would be highly useful for re- 
search purposes as well, the present investiga- 
tion of the relationship between reported 
boredom and changes in the curve of output 
was undertaken. Also included in this in- 
vestigation were such other proposed be- 
havioral indices of boredom as talking, varia- 
bility,of production, and frequency of volun- 
tary rest pauses. 

This study was conducted in a small knit- 
wear mill in northern Pennsylvania. Most 
operators in the mill, and all operators studied 
in detail here, were paid by piece rate. Two 
operations were chosen for observation. Both 
were: (1) short enough so that variations in 
production would show up in the output 
curves; (2) long enough to permit timing of 
several operators at once; (3) performed in 
a uniform manner by several experienced 
operators; and (4) largely manual, so that 
the operator rather than the machine de- 
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termined the rate of production. Eight 


women were engaged in each operation. 


The Criterion 


The subjective feelings of the workers were 
‘determined both by interview and by ques- 
tionnaire. The British investigators used as 
a criterion of boredom the answers to a series 
of interview questions. It is not clear how 
the investigators avoided the influence of 
suggestion, since many of their questions in- 
quire about the possibility of boredom and 
mention slowing of work at particular times 
of the day. A rather strange circularity in 
their reasoning is also evident. Their cri- 
terion of severity of boredom symptoms con- 
sisted of a total weighted score on a number 
of questions “based upon the various symp- 
toms of boredom and discontent observed in 
previous investigations” (9, p. 2). These 
questions included several involving changes 
in feelings and rate of working with the pas- 
sage of time. (For example, “Do you think 
you work better in the morning or the after- 
noon? Why? When do you think you work 
best in the morning? Why? In what part 
of the morning does time seem to pass most 
quickly? Why? Do you feel bored at any 
time during the morning? When?” etc.) A 
report of boredom or slowing of work in the 
middle of the work spell was considered to 
be an indication of greater boredom than a 
report at other periods of time, and was, 
therefore, weighted more heavily. A _ total 
weighted score was thus compiled, and it 
was against this criterion that the investiga- 
tors compared their classifications of produc- 
tion curves. It is not, then, startling, that 
those workers who thought that they slowed 
in the middle of the working period produced 
output curves with a sag in the middle, and 
that the investigators were therefore able 
to find fairly good agreement between bore- 
dom and shape of the output curves. 

Such questions from the British list were 
therefore eliminated. The remainder were 
further modified so that they could be used 
in questionnaire form, and several others that 
seemed relevant were added. The business 
agent of the union called a special meeting 
of workers who were not to be part of the 


major study so that the preliminary form of 
the questionnaire could be tried out. After 
several revisions, the questionnaires were ad- 
ministered to seventy-five workers, including 
those observed in this study. At this point 
the help of the union was especially im- 
portant in securing frankness and coopera- 
tion. The answers to the criterion questions 
were item-analyzed against total score, and 
a weighted criterion score devised. 

In addition to the questionnaires, each of 
the workers was interviewed at least once. 
They responded quite freely, expressing their 
attitudes, favorable and unfavorable, toward 
various aspects of the factory situation, and, 
of course, toward management personnel. 
On the basis of interview and questionnaire 
data, which agreed closely, the workers were 
classified according to the degree of their 
experience of monotony. These classifica- 
tions were substantiated by gratuitous com- 
ments given from time to time during the 
observations. 


Procedure and Results 


The first operation studied involved using 
a power sewing machine with a specialized 
folder attachment to hem the bottoms and 
the sleeves of men’s cotton T-shirts. Work 
was counted in bundles of five dozen, one 
bundle being completed approximately every 
half-hour. All workers included in the study 
had been on the job over a year. Only a 
few of the operators reached standard. The 
spread between the guaranteed wage and 
standard was very large, however, so that 
almost all workers were receiving piece-rate 
earnings in addition to the guarantee. Work- 
ers, as might be expected, considered stand- 
ards somewhat “tight, but not out of line” 
with others in the mill. 

Attitudes toward management (and union) 
varied greatly within the group, several of the 
workers expressing very strong pro-manage- 
ment opinions and several speaking just as 
vociferously against company policies. A 
few questions concerning job satisfaction were 
included in the questionnaire. Results in- 
dicated that job satisfaction was at least 
fair for the 75 workers questioned. For ex- 
ample, on a scale running from 1 (“I would 
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not stay on my job for a minute if I could 
get something else”) to 6 (“I feel it is the 
ideal job for me’’), the median choice was 
3 (“I like this job about as well as any job 
which pays the same”) and 70 per cent 
checked 3 or better. Furthermore, 90 per 
cent checked that now they liked their work 
better than when they started. 

Operators were observed continuously for 
one week. The times were recorded for the 
beginning and completion of each bundle and 
the beginning and end of all delays, whether 
voluntary or caused by conditions beyond the 
operator’s control. Nature of delays was 
noted, as well as the time of the occurrence of 
talking and singing. 

Figure 1 shows the results of the observa- 
tions concerning rate of output for one worker 
for the entire week. This operator had a 
very high boredom score, and reported on the 
interview that she was bored “from the first 
thing in the morning until the mill closes.” 
The ordinates in the figure represent the 
number of garments sewed per minute, each 
block representing a bundle of five dozen; 
units on the base line are ten-minute periods 


through the working day, which is sup- 
posed to extend from 7 to 12 and from 1 to 


4:30. It is not clear from the English re- 
ports how the investigators handled stoppages 
for voluntary rest pauses. Rothe (3, p. 202) 
pro-rated his curves across the gaps caused 
by unscheduled pauses. We indicated volun- 
tary stops by a drop to the base line, pro- 
rating the data to form a continuous line 
when the stoppage was beyond the control of 
the operator. 

These charts illustrate the extreme incon- 
sistency of the shapes of the curves from day 
to day. Curves for the other workers were 
equally dissimilar. Wyatt, Langdon, and 
Stock (9) reported that they kept detailed 
records of output “in some cases . . . over a 
period of two or three months” for 68 work- 
ers. Apparently these daily work curves 
were averaged to obtain a typical curve for 
each worker, and were classified according to 
shape, presumably by inspection. The curve 
classifications were then compared with their 
criterion of boredom, with fairly good agree- 
ment. Averaging daily figures from the pres- 


ent investigation resulted in a meaningless 
flat curve in all cases. The weekly averages 
were thus useless for criterion purposes. It 
was thought that the averaging process might 
be obscuring the characteristic shape of the 
daily curves. Two judges, therefore, at- 
tempted independently to classify the daily 
curves .into four categories: (1) gradually 
ascending curves (the shape classically con- 
sidered to indicate boredom without fatigue) ; 
(2) U-shaped curves (the mixed boredom- 
fatigue type which was supposed to occur 
most commonly on monotonous tasks); (3) 
gradually descending curves (supposed to 
indicate fatigue); and (4) miscellaneous 
curves which did not appear to belong in 
any of the standard classifications (includ- 
ing flat or straight curves). 

In the entire group, there was no ascend- 
ing curve. Judges agreed that there was one 
possibly U-shaped curve; unfortunately for 
the hypothesis, however, the operator re- 
ported that she was almost never bored, and 
was not bored that particular afternoon. 
Each judged approximately one-third of the 
curves as descending, but judges agreed on 
less than half of these. The remainder of 
the curves were considered unclassifiable by 
both judges. The relationships were not, of 
course, significantly different from chance 
when tested by the chi-squared test. It was 
not possible to utilize correlational techniques 
for comparing curves, as recommended by 
Rothe (3, p. 210), since these data were 
gathered in terms of equal work units rather 
than equal time units. It seemed apparent, 
however, that the individual work curves 
could not be reliably classified according to 
standard categories. 

It was considered possible that social group- 
ings were influencing the rates of working, 
thus accounting for the variability in shapes 
of work curves from day to day. During 
working hours workers on this operation 
formed two clearly defined social groups, 
with several isolated members. Visits to 
workers’ homes and recreational areas estab- 
lished the fact that social groupings in the 
plant corresponded only to a minor extent 
with those formed after working hours. Two 
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judges examined the work curves for each 
work period and attempted to group together 
curves of similar shape, disregarding tradi- 
tional classification systems. Groupings by 
the judges did not agree to a significant ex- 
tent. Moreover, the groupings of neither 
judge agreed with either work or recreational 
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social groupings of the workers. None of 
these relationships was significantly different 
from chance when tested by the chi-squared 
test. Neither in the work nor in the recrea- 
tional groups was there any evidence that 
work curves of members resembled one an- 
other for the same period of work. 
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It seemed likely that another operation 
would yield more traditional results. One 
was chosen, therefore, in which there were 
two portions to the task, which could be ob- 
served and timed separately. The job was 
called taping. Two short stiffened pieces of 
cotton tape were sewed on the unhemmed 
bottom of the shirt. After the operator fin- 
ished sewing the bundle, she cut the threads 
and folded the shirts. Again the operators 
were timed ‘continuously for one week, and 
again no characteristic curves were found 
for cutting, for sewing, or for the two com- 
bined. Similarly, there seemed to be no rela- 
tionship between any daily work curve and 
the reported feelings of the operator. Again, 
the only operator who produced either ascend- 
ing or U-shaped curves reported that, despite 
the numerous disadvantages of her work and 
the company, she was certainly not bored. 
Thus the production curve criteria proved 
not only unreliable, in that observers could 
not agree upon classification of curves, but 
invalid as well. 

One of the major difficulties in the use of 
production curves as criteria lies in the analy- 
sis of data. There are no satisfactory statis- 
tical measures available for the comparison 
of the shapes of curves. Interpretation of the 
results of correlational analyses is sometimes 
made difficult by the peculiarities of the dis- 
tributions involved. Moreover, the correla- 
tion coefficient cannot allow for over-all simi- 
larities in the shapes of the curves, when the 
changes of slope are displaced slightly in 
time. The alternative method of visual in- 
spection is subjective and, apparently, un- 
reliable. Both methods show little agree- 
ment from day to day, so that even though 
it could be demonstrated that daily curves 
reflected the experience of the individual 
workers, it would be highly unlikely that any 
long-term relationships could be demon- 
strated. Their use in this kind of situation 
appeared to be impractical. 

Other changes in behavior which have been 
related to boredom include frequency of talk- 
ing, frequency of rest pauses, variability in 
rate of working, and average speed of work- 
ing. It was possible to rank the workers 
within each group on each of these factors 


and on intensity of boredom symptoms, esti- 
mated from both questionnaire scores and in- 
terview responses. The rankings were com- 
pared and the relationships tested for sig- 
nificance by Kendall’s non-parametric tau 
test (1, 403-408). No significant or even 
consistent relationship appeared between the 
boredom symptoms and the proposed indices. 
Reliability of the behavioral indices was esti- 
mated by comparing total rankings for each 
worker on Monday, Tuesday and Wednesday 
with the totals for Thursday and Friday. 
All of these relationships proved significant 
at the 5 per cent level or better by Kendall's 
tau test. Individual differences were, there- 
fore, reasonably stable throughout the week. 


Discussion 


Why were these results so different from 
those of the British Industrial Research 
Board? In the first place, comments of the 
workers showed that each had her own con- 
cept of the number of bundles that she should 
complete in a day. If she was_ behind 
schedule, she hurried toward the end of the 
day; if she was ahead, she slackened speed 
or stopped entirely. One operator, who had 
just completed all but one of her customary 
bundles for the day, commented, “You've 
seen how fast we can do them. Now do you 
want to see how slow?” Production figures 
reflected quite clearly what the workers con- 
sidered to be the proper pace for them at 
that particular time, but not at all neces- 
sarily the way they felt about their work. 

It has been the observation of the writer 
that such pacing of work occurs with much 
greater frequency in industrial situations than 
does spontaneous variation in rate. Even 
when there is no restriction due to fear of 
rate-cutting, it is normal for any worker to 
decide in advance how much he will produce, 
and earn, each day. Effort is unquestionably 
pegged, at least within narrow ranges, in 
most industrial situations. 

A careful re-examination of the English 
studies suggests several differences in method 
which perhaps further account for the dis- 
crepancy between our results and theirs. The 
most serious has already been mentioned; 
they included in their criterion items which 





74 Patricia Cain Smith 


were related to changes of rate of working, 
and weighted these items in the direction 
favorable to their hypothesis. The reader is 
not told, moreover, whether or not their 
curves were classified without knowledge of 
the accompanying verbal reports. Several 
other factors apparently operated to make 
the shape of their curves more consistent 
from day to day. Although they do not 
specify the kinds of jobs involved, one would 
infer from comparison of the various re- 
ports that at least six different operations 
were involved, with various hours of work 
and methods of payment. Such variations in 
jobs and conditions would tend to mask in- 
dividual variability. 

One last factor should be noted. There is 
no indication in any of their data of volun- 
tary rest pauses, even for rest-room visits. 
If decreases in production due to such work 
stoppages were averaged into their curves, 
this procedure would account for the con- 
sistency of the curves from day to day, as 
well as for the preponderance of U-shaped 
curves, since rest-rooiii’ and water fountain 
visits tend to be made at about the same 


time every day, and mostly in the middle of 
the work period. 


Summary 


Continuous observation of two groups of 
eight women each, operating power sewing 
machines on light, uniform and repetitious 
work, led to the following conclusions: 

1. There were fairly stable individual dif- 
ferences in speed of working, variability of 
production, frequency of rest pauses, and fre- 
quency of talking. 

2. These differences showed no consistent 
relationship to the reports of the workers 
concerning their feelings of boredom or mo- 
notony. 

3. No shape of work curve was found 
which would characterize the individual 
worker. 

4. Work curves for individuals forming so- 
cial groups showed no observable relationship 
with each other. 

5. The approach of the closing hour had 
a noticeable effect on the production of many 
of the workers. The direction of the change 


in rate which appeared at the end of the day 
was determined by the concept of a day's 
work held by the worker. 

6. Boredom is not necessarily accompanied 
by a depression in the curve of output, nor is 
a sag necessarily accompanied by feeling of 
boredom. 

7. Output curves should be viewed with 
caution as indications of the subjective feel- 
ings of the worker. 

There can be little quarrel with the claim 
of the British investigators that, other fac- 
tors being equal, workers tend to slow down, 
talk, become restless and variable in their 
production when bored. In most industrial 
situations, however, one cannot assume that 
all other factors are equal, and many of these 
factors may heavily outweigh the influence 
of interest or boredom in producing changes 
in working behavior. 


Received May 28, 1952. 
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Predicting Success in Elementary Accounting 
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Office of Student Personnel and Guidance, University of Wyoming 


A recent article by Traxler‘ regarding the 
use of objective tests for the selection of per- 
sonnel in the professional field of accounting, 
encourages further investigation of the suita- 
bility of a number of tools for the prediction 
of success in college courses in accounting. 
This study represents a preliminary investiga- 
tion of the relative validity of Form C of the 
American Institute of Accountants Orienta- 
tion Test (AIA), the 1947 Edition of the 
American Council on Education Psychologi- 
cal Examination (ACE), Form 23 of the 
Ohio State University Psychological Test 
(OSU), and the accountant scale of the 
Strong Vocational Interest Blank for Men 
(SVIB), for predicting success in elementary 
accounting at the University of Wyoming. 

In the fall of 1949, the AIA Orientation 
Test (Form C) and the Strong Vocational In- 
terest Blank for Men were administered to 
110 freshmen students enrolling in elemen- 
tary accounting in the College of Commerce 
and Industry at the University of Wyoming. 
Scores on the other two tests mentioned above 
were already available for most of these stu- 
dents and it was possible to secure the four 
test scores and accounting grades for 95 stu- 
dents out of the 110. Of this number 76 
were men and 19 were women. 

Statistical constants for the four tests and 
the criterion are presented in Table 1. 

Intercorrelations for the five variables in- 
volved were computed and are contained in 
Table 2. 

The highest coefficient of correlation, .84, 
was that between OSU and ACE. This was 
to be expected since both are tests of general 
ability to do college work. A_ substantial 
relationship is also noticed between these two 
tests and AIA Orientation Test. No sub- 
stantial relationship seems to exist between 
SVIB and the other three tests although the 

1 Traxler, A. E. Objective testing in the field of 


accounting. Educ. psychol. Measmt., 1951, 11, 427- 
439. 
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coefficient of correlation of .18 between SVIB 
and AIA Orientation Test is of interest. The 
standard error of .18 was .099, indicating 
significance between the .05 and .10 levels. 
The most interesting revelation is that both 
ACE and OSU seem to be more closely re- 
lated to grades in elementary accounting than 
is AIA Orientation Test. This is especially 
interesting in view of the fact that AIA Orien- 
tation Test is intended to be “a general in- 
telligence test slanted toward business.” * 

Multiple correlations were computed be- 
tween all possible pairings of the four tests 
and accounting grades. Table 3 lists the cor- 
relations obtained. If the prediction tools 
are to be limited to two out of the four tests 
considered here, it would seem that the best 
two combinations would be ACE and SVIB, 
or OSU and SVIB. Again the interesting 
revelation is that ALA Orientation Test is 
not to be found in either of the best two 
combinations of two out of four tests. 

Multiple correlations were also computed 
between all combinations of three out of four 
tests and accounting grades. These correla- 
tions are recorded in Table 4. 

The best combination of three tests for 
predicting success in elementary accounting 
is apparently ACE, OSU and SVIB (Account- 


Table 1 

Means and Standard Deviations of Test Scores and Fall 

Quarter Grades in Elementary Accounting* 
Mean 


Variable $.D 


ACE Psychol. Exam 

OSU Psychol, Test 

AIA Orientation Test 

Strong Interest Blank, Acctg. Key 
Elem. Accounting Gr.t 


110.5 
079 
33.9 
36.9 
29 


*N = 95. 
t Grades given at the University of Wyoming are as 
follows: I (A), IT (B), If (C), 1V (D), and V (Failure) 


° Ibid., p. 428. 
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Table 2 


Intercorrelations of Test Scores and Grades.in Elementary Accounting for the Fall of 1949* 








OSU 
Psychol. 
Test 


ACE Psychol. Exam. 
OSU Psychol. Test 
AIA Orient. Test 
Strong Interest Blank, Acctg. Key 


*N = 95. 


AIA 
Orient. 
Test 





Elem. 
Accounting 
Gradest 


Strong 
Interest 
Blank 
.66 00 36 
52 37 
18 32 

.26 


t Due to the grading scheme employed, e.g., A = 1, B = 2, etc., these coefficients of correlation as compiled 
were all negative, but are listed here as positive since the true sense of the relationship is positive. 


ing Key). Again it is interesting to note that 
this is the one possible combination of three 
out of the four tests that does mot include 
the AIA Orient. Test. Addition of AIA 
Orient. Test to the cluster of three tests did 
not appreciably increase the predictive value 
of the cluster. (Both R’s were .55 when 
rounded to two decimals. Theoretically the 
introduction of an additional variable into a 
cluster will always increase R, but the in- 
crement in this instance was so small that 
it is not observable when the R’s were 
rounded. ) 

The findings are all based upon the as- 


Table 3 


Coefficients of Multiple Correlation Between Various 
Pairings of Test Scores and Grades in 
Elementary Accounting in the 
Fall of 1949* 





Pairs of Test Scores 





ACE Psychol. Exam. and 
Strong Interest Blank, Acctg. Key 


OSU Psychol. Test and 
Strong Interest Blank, Acctg. Key 


ACE Psychol. Exam. and 
AIA Orient. Test 


OSU Psychol. Test and 
ACE Psychol. Exam. 


OSU Psychol. Test and 
AIA Orient. Test 


AIA Orient. Test and 
Strong Interest Blank, Acctg. Key 





*N = 95. 


sumption that accounting grades are an ac- 
ceptable criterion for judging the relative 
validity of the tests under consideration. 
While grades are known to be not as reliable 
as is desired, they are the criterion of per- 
formance most generally used in college 
courses. 

It is obvious that this study is restricted 
to the relationship between the test scores 
considered and grades. It does not neces- 
sarily follow that the same relationship exists 
between the test scores and success in actual 
employment in the field of accounting. 

For example, it is possible that many col- 
lege professors weigh mastery of the theoreti- 
cal aspects of accounting more heavily than 
practical skills in accounting when awarding 
grades. On the other hand, success in em- 
ployment in accounting may be more closely 
related to the practical skills. These, of 
course, are hypothetical assumptions, but they 
do illustrate the danger of drawing conclu- 
sions from this study concerning the rela- 


Table 4 


Coefficients of Multiple Correlation Between All 
Possible Combinations of Three and Four 
out of Four Tests and Grades in 
Elementary Accounting 








Combinations of Test Scores 





ACE, OSU, and SVIB 
ACE, OSU, SVIB, and AIA Orient. Test 
ACE, SVIB, and AIA Orient. Test 
OSU, SVIB, and AIA Orient. Test 

ACE, OSU, and AIA Orient. Test 
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tionship between AIA Orientation Test and 
success in employment in accounting. 


Summary 


1. If a single test is to be utilized in pre- 
dicting grades in elementary accounting, ACE 
Psychol. Exam. and OSU Psychol. Test are 
preferable to the AIA Orientation Test. 

2. If two tests are to be used, neither of 
the two best combinations of two out of four 
tests includes the AIA Orientation Test. 


3. If three out of the four tests are to be 
used, the best combination of three does not 
include the AIA Orientation Test. The ad- 
dition of AIA Orientation Test to the cluster 
of three does not improve the predictive value 
of the cluster. 

4. It does not necessarily follow that the 
same relationship would be obtained if the 
criterion used were success in professional em- 
ployment as an accountant. 


Received May 28, 1952. 
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Suppose a selection plan has been validated 
and the multiple R turns out to be about .60. 
What does a validity coefficient of this size 
indicate about the selective efficiency of the 
plan? 

The index of predictive efficiency (E£) is 
not a satisfactory measure. For an R of .60: 


E = 100 (1 — V1 — fr”) = 20% 


which represents the per cent improvement 
over chance in predicting individual criterion 
scores. But ordinarily a selection plan is 
designed merely to pick a group of successful 
workers and to eliminate a group of unsuc- 
cessful workers—not to predict the criterion 
score of each individual. 

What we need is an index of selective ef- 
ficiency that will indicate how well we can 
pick such groups. Particularly we are inter- 
ested in accepting as many as possible of the 
potentially superior workers and rejecting as 
many as possible of the potentially inferior 
workers. Let us call the highest quarter on 
the job criterion “superior workers,” and the 
lowest quarter on the job criterion “inferior 
workers.” The middle half will be “mediocre 
workers.” Then: 


Successes of the plan are superior workers 
accepted and inferior workers rejected. 


Failures of the plan are superior workers re- 
jected and inferior workers accepted. 


Suppose we have two hundred applicants 
and choose half of them with the aid of a 
brown Stetson hat. If we obtain job cri- 
terion scores for all of them, we can expect 
to find something like this: 


Inferior Mediocre Superior 
Accepted 25 50 25 


Rejected 25 50 25 


Successes: 25 + 25 = 50 
Failures: 25 + 25 = 50 


Selection on a chance basis leads in the 
long run to an equal number of successes 
and failures. 

But suppose we use a selection plan hav- 
ing a validity coefficient of about .60. If we 
obtain job criterion scores for all two hun- 
dred men, we should find something like this: 


Inferior Mediocre Superior 
Accepted 10 50 40 
Rejected 40 50 10 
Successes: 40 + 40 = 80 
Failures: 10+ 10 = 20 
80 — 20 _ 
80 + 20 ~ 


IY 


Improvement over chance: % 


With any actual sample of 200 applicants, 
the figures might not come out in this exact 
symmetrical pattern but the per cent im- 
provement over chance should be substan- 
tially the same. 

With the aid of a chart for computing 
tetrachoric r' it is possible to determine the 
theoretical improvement over chance cor- 
responding to any obtained value of R for 
any proportion of total applicants accepted. 
Some typical values of the index of selective 
efficiency (S) are shown below: 

Proportion 
Accepted 
One-third 
One-half 
Two-thirds 


R = 50 
48% 
52% 63% 85% 
48% 57% 76% 


For all practical purposes we may say: 
the index of selective efficiency (S) has the 
same numerical value as the validity coeff- 
cient, if we are accepting something between 
one-third and two-thirds of the applicants. 

In our experience, the index of selective 
efficiency (S) has proved a useful way of 
explaining the meaning of a validity co- 
efficient to someone who is unfamiliar with 
statistics. 


Received June 16, 1952. 


1 Jenkins, W. L. A single chart for tetrachoric r. 
Educ. psychol. Measmt., 1950, 10, 142-144. 


R= 00 
57% 


R= .70 
66% 
74% 
66% 


R = .80 
76% 
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A Note on Techniques in the Investigation of Accident Prone Behavior * 


Lawrence L. LeShan 
Roosevelt College 


In the past several years there have ap- 
peared, in the psychological literature, a large 
number of studies of accident proneness. 
Many of the articles which have appeared 
have lost some of their potential value due 
to a lack of clarity concerning the special 
problems of technique which exist in this 
field. It is the purpose of this paper to point 
up a few of these problems.’ 


Method 


The usual method of finding a population 
of accident prones includes either an inter- 
view technique or a survey of accident records 
in an industrial organization, a police file, in- 
surance records or some source of this sort. 
Each of these has dangers attached to it. 

The interview. Accident prones have a 
strong tendency to “forget” accidents. A few 
examples may serve to illustrate this. 

One man revealed half a dozen major acci- 
dents. Intensive interview probing found no 
others. At the end of the interview, he was 
asked to strip, and his body was examined 
for scars. A previously undisclosed scar on 
the right side of his chest was called to his 
attention. He then remembered that three 
years previously a bulldozer had rolled over 
him, injuring his back and breaking three 
ribs. 

Another man was leaving the interview 


*The authors accept full responsibility for this 
paper. It is a pleasure, however, to thank Thomas 
Fansler, Research Director of the National Safety 
Council, for raising and clarifying many of these 
points 

! The authors became interested in this problem as 
a result of being involved in research concerning the 
psychodynamics of individuals with a history of re- 
peated accidents. (The results of this study were 
published in Psychiatry: Journal for the Study of 
Interpersonal Processes, Vol. 15, No. 1, 1952, pp. 
73-80. As part of this research, thirty-five accident- 
prones were interviewed by one of the authors 
(JBB). The other part of the study (consisting 
of analyses of projective tests on sixty-five accident- 
prones and seventy-five equated non-accident-prones 
and approximately twenty intensive interviews with 
accident-prones) was completed by the other author 
of this paper. 


and 


Jim B. Brame 


University of Houston 


room when the examiner (JBB) noticed he 
had a bent distal phalanx of the right little 
finger. On inquiry the patient said he “just 
remembered” that he broke that finger the 
previous year. 

A twenty-one year old male with several 
accidents denied any further accidents dur- 
ing thirty minutes of detailed questioning. 
Towards the end of this period he started 
rubbing his right elbow. On specific ques- 
tioning, he recalled that he had broken his 
arm when he was eighteen. 

Behavior of this sort is by no means in- 
frequent. In the experience of the authors, 
it is the general rule rather than the excep- 
tion.” 

For this reason, a great deal of skepticism 
must be attached to results gained by the 
written questionnaire method also. An ex- 
perience of one of the writers (LLL) illus- 
trates this point. A questionnaire was filled 
out by 40 accident repeaters who had been 
called into a state driving clinic. They had 
all had at least three auto accidents in 12 
months. Over half of them did not remem- 
ber all three accidents. 

When an interview technique is being used 
to obtain an accident history, the subject 
should be questioned on a year-by-year basis. 
This would include the jobs worked at and 
the particular hazards of each job; vehicles 
driven, repairs and their cost; sports par- 
ticipated in, falls and bruises. Special refer- 
ence should be made to burns and _ scalds 
since these are not often thought of as 
“accidents” by the subject. 

The interview frequently makes people de- 
fensive about their accidents record as they 

* No quantitative estimate as to how large a per- 
centage of their accidents these individuals forget 
can be made, since we do not know how many acci 
dents were not recalled at all in our interviews 
However in thirty-five interviews, at least thirty of 
the subjects recalled several more accidents after 
careful probing than they had when simply asked to 


list all the accidents that they had had and then were 
given plenty of time and a sympathetic listener. 
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may see implications of punitive intent. 
They may, therefore (in addition to the acci- 
dents that they have repressed), deliberately 
not state others which they do consciously re- 
member. For this reason, careful attention 
must be paid to the psychological atmosphere 
of the interview. A good relationship is es- 
sential to accurate data collation. We feel 
that, by and large, an authoritarian relation- 
ship tends to produce markedly less data than 
an egalitarian one. A procedure that is often 
helpful is to express interest in the general 
health history and to record all illnesses. 
One point about the interview which should 
be considered in research design is that it is 
essential to gather data on control groups in 
the same manner it is gathered on groups of 
accident repeaters. An intensive probing in- 
terview covering the entire life-span of the 
individual produces a surprisingly large num- 
ber of accidents in the general run of the 
population. Since a definition of accident 


proneness implies that the individual con- 
cerned has a higher accident rate than his 
peers, both experimental and control groups 
have to be evaluated with the same technique. 

The use of accident records. 


Probably the 
most common technique for studying acci- 
dent-prones is to use the data of the safety 
departments of police or insurance firms, in- 
dustrial firms, etc. There are several dangers 
inherent in this method, two of which might 
be mentioned briefly. 

Although this is probably a valid way of 
of collecting data on experimental groups, it is 
a dubious procedure for control groups. We 
do not know how many individuals are acci- 
dent prone at home and not at work. If a 
man has a high off-duty accident record and 
a low on-duty accident record and we study 
him as a non-accident prone since we have 
only the plant statistics, he is likely to con- 
fuse our data, to say the least. 

We know so little about the accident- 
prone that we do 1.ot know if he is more or 
less prone to report his accidents to the plant 
infirmary or to the police, if he tends to 
report only certain types of accidents, etc. 

% This statement is not the result of any experi- 


mental work, but simply an impression based on ex- 
perience with varied types of interviews. 


Defining an Accident 


This is a complex and difficult problem. 
Generally we consider an accident to be a 
mishap with a sudden onset. However, this 
by no means solves our problems. Paren- 
thetically, it might be stated that the Work- 
men’s Compensation Act of the State of Vir- 
ginia has a four line definition of “injury”’ 
which is followed by seventeen single spaced 
pages of clarification in fine print. 

We have little clear understanding of the 
difference between a disease and an accident. 
If a workman habitually neglects washing his 
hands after he finishes work with coal-tar 
products and develops a skin irritation which 
incapacitates him, how does this differ from 
typical accident-prone behavior in which the 
individual injures himself by neglecting 
elementary safety precautions? Should we 
count this as an accident? 

Even though we eliminate occupational 
disease and use only traumatic injury, other 
problems arise in the same area. We see a 
report of a man who has 15 back-sprains. 
The medical report states he has a “weak 
back.” Is each sprain to be counted as an 
accident? Is there a difference between the 
man who has this particular disorder and 
(granted freedom of choice) repeatedly gravi- 
tates to jobs calling for heavy lifting and a 
similarly handicapped man who takes posi- 
tions which will not put such a strain on his 
back? 

The difference between a chargeable and a 
non-chargeable accident is often used in stud- 
ies but is frequently more apparent than real. 
Surveys of trucking company records by one 
of the writers (LLL) have shown that in- 
dividuals who have high rates of chargeable 
accidents tend also to have high rates of non- 
chargeable accidents. Many accidents which 
appear to be non-chargeable on superficial ex- 
amination are chargeable when carefully ex- 
amined. One accident prone had had four 
automobile accidents while he was sitting in 
the front seat of a car and someone else was 
driving. He had, he said, “generally been 
talking to the driver when it happened.” In 
one of the accidents he had hurt his elbow 
badly as it had been outside the ventilator 
window when the car crashed. This state- 
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ment stimulated the interviewer to probe at 
some length into exactly what had happened. 
After five minutes a picture emerged that was 
quite different from the earlier “non-charge- 
able” one. It is true that he had been sitting 
beside the driver, but he had decided to clean 
the windshield. He thrust his hand with a 
towel through the ventilator window. At 50 
MPH, the towel flapped over and covered 
most of the windshield, the driver was blinded 
and the crash occurred. 

Another type of problem in defining the 
accident is illustrated by an individual who 
had no history of injuries or accidents (as 
they are usually defined). However, in- 
vestigation showed that he had been fired 
from his last position (as a pharmacist) for 
“making mistakes.” At the time he was seen 
he was working as a pilots’ mechanic. This 
man had no automobile crashes or falls in his 
background. He simply made minor errors 
in work of such a nature that the errors could 
have disastrous effects. Definitions of acci- 
dent made for a particular study should 
clearly exclude or include individuals of this 
sort. 


General Considerations 


There are no agreed-on definitions of “acci- 
dent,” “injury,” or “accident prone.” Each 
study must first decide what it is attempting 
to find out. In terms of various factors such 
as population studied, purpose of research, 
techniques available, etc., definitions can be 
made. 

This, perhaps, can be most clearly seen in 
defining the accident-prone. There is gen- 
eral agreement that he should have an acci- 
dent rate higher than that of his peers, but as 
to how far above the mean of his peer group 
he must be there is no agreement. Shall we 
cut off the upper 1% of our population and 
label them “accident-prones,” or shall we use 
the upper 5%, the upper 25‘%, or the upper 
50%? There is no agreement here. 

The problem can be approached in an- 
other way. Rather than examine (by ‘im- 
plication) the accident liability of the specific 
environment we are studying, as was implied 
in the last paragraph, we can examine the 
accident liability of the individual. We can 
then use criteria such as one accident per 


year for at least 5 years, or 3 accidents every 
2 years for 10 years, etc. In this way the 
State of Oregon labels a man an “accident- 
repeater’ if he has had 3 accidents in any 
12-month period. This only includes 4% of 
state drivers, but the 4% have 40% of the 
accidents. (Unlisted memo. in the files of 
the National Safety Council.) 

A problem here is that the total accident 
record of some individuals is not consistent 
by a year-by-year, or even decade-by-decade, 
analysis. Often a person may normally be 
non-accident-prone but for a period of two 
to five years show high accident rate and 
then at the end of this time, return to his 
former low level of accidents. 

In the design of research, it may be un- 
wise to use as controls only individuals with 
low accident rates. There is no evidence 
that this is not a special group with different 
characteristics than are found in the normal 
population. Until this problem has been in- 
vestigated, controls should be taken from the 
center of the accident curve rather than from 
the lower extremity. 

Good research design will demand that ac- 
count be taken of both the accident liability 
of the specific environment and the liability 
of the individual. Fleming and Dickinson's 
excellent paper * discusses the relationship of 
personal and situational liability. They state, 
in part, “A high accident potential and an 
accident-prone driver make for a high acci- 
dent expectancy. <A high accident poten- 
tial and a normal driver make for an accident 
possibility” (p. 171). No study which does 
not evaluate both the individual and the 
group accident rate can expect to produce 
clear cut results. 


Summary 

Special problems exist in the design of 
studies of accident prone behavior. <A few 
of these are briefly discussed. Difficulties in 
finding the accident-rate of an individual, 
defining an accident, and delimiting accident- 
prone and the non-accident-prone groups are 
pointed out. 


Received May 26, 1952. 
4J. Fleming, Jr., and J. J. Dickinson. Accident 


proneness and accident law. Harv. law Rev., 1950, 
63, 169. 
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Our main problem in this study is to test 
the efficiency of a measuring instrument to 
predict the ability of a teacher to effect 
harmonious interpersonal relations in the 
classroom. We believe that harmonious in- 
terpersonal relations in the classroom are de- 
sirable. We also believe that the teacher is 
a key figure in the kind of relationship that 
prevails. If good interpersonal relations are 
obtained between teacher and students, then 
it follows that the teacher and students will 
work together in a social atmosphere of co- 
operative endeavor and with a mutual feeling 
of security. Also the students will be mo- 
tivated to learn the material at hand more 
easily, and will have an opportunity to do 
so in a manner which is most efficient for 
them individually. If, on the other hand, 
the social climate in the classroom is char- 
acterized by tension, fear, and submission on 
the part of the students, the student is apt 
to have little motivation to learn; and, as a 
by-product, numerous disciplinary problems, 
inattention, and restlessness will result. If 
there is mutual distrust and hostility between 
the teacher and students, probably little 
learning will occur. 

We have assumed that a teacher's atti- 
tudes resulting from his life experiences will 
have a noticeable effect on the kind of rela- 
tionships which this teacher creates in his 
classroom. These attitudes are presumed to 
be a result of a multitude of factors such as 
values, personality traits, intelligence, gen- 
eral knowledge, and teaching skills. If we 

* This study was a part of the University of Mis- 
souri Agricultural Experiment Station Project No. 


G-—48, which was a part of the Office of Naval Re- 
search Project NR 154-111. 


are able to measure these attitudes satisfac- 
torily, we then should be able to predict to 
a significant degree the kind of relationship 
which will be obtained in the classroom. 
Specifically our problem is: How well will 
the Minnesota Teacher Attitude Inventory 
(MTAI) predict interpersonal relations in 
the classroom? 


Procedure 


The predictor. The MTAT was selected as 
the predictor for this study since it attempts 
to measure the kinds of teacher attitudes 
which are relevant to teacher-student rela- 
tions. Two studies of the validity o: the 
MTAI have already been reported (2, 3). 
In each of these it was found that the MTAI 
would predict a three-fold criterion of teacher- 
student relationships to the extent indicated 
by a correlation coefficient of .59. The 
MTAI contains 150 attitude statements to 
which the teacher responds with one of five 
possible responses. The scoring system was 
determined by purely empirical means (2). 
Responses to the MTAI were secured from 
one group of teachers judged to be superior 
in their relations with students and another 
group judged to be inferior in their relations 
with students. The per cent of each group 
choosing the various response categories was 
computed and the significance of the differ- 
ence between these percentages was deter- 
mined. A significant difference in percent- 
age favoring the superior group was scored 
“4+ 1”: a significant difference favoring the 
inferior group was scored “— 1”; all non- 
significant differences were scored “0.” Fol- 
lowing is an example: 
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Item: Most children are obedient. 


Strongly 
Agree 
34% 
18% 
+16 
+1 


Superior group 
Inferior group 
Differences in % 
Scoring 


It can be argued, on logical grounds, that 
the “uncertain” and “strongly disagree” re- 
sponse categories should be scored “— 1.” 
However, in the past logical face validity for 
determining scoring systems has been found 
to be such a notoriously poor predictor of 
psychological functions, that the authors of 
the MTAI decided to use a scoring system 
based on empirical data only. 

The criterion. A major task was to de- 
scribe adequately the kinds of relationships 
which existed in each of several classrooms. 
We obtained three estimates of this relation- 
ship from different sources. First we ob- 
tained such an estimate from the students in 
each classroom. This was obtained through 
a 47-item questionnaire or inventory about 
“My Teacher,” which was administered to 
all students in attendance the day we con- 
tacted the class. This inventory is the same 
Such ques- 


as the one used by Leeds (2). 
tions as these were asked: “Do you like 


school?” “Is this teacher often bossy?” “Is 
this teacher usually kind to you?” The in- 
ventory was scored “rights minus wrongs.” 
The possible range in scores was + 47 to 
— 47. Therefore, a score of zero indicates 
that the student made as many negative 
criticisms of the teacher as he made positive 
statements about him. The zero score would 
be below that expected for an average teacher. 
The mean score on the student inventory for 
each class was obtained. This mean score 
constituted the evaluation by the students of 
the interpersonal relations in that particular 
classroom. 

The second evaluation of the classroom 
relations was made by the principal of the 
school. The principal made his evaluation in 
the form of a rating scale. This is the same 
rating scale which was designed and used by 
Leeds (2). Items 1 through 6 of the rating 
scale were scored on a 5-point scale, thus 
yielding a possible range in scores of 6 


Agree 


Strongly 
Disagree 
3% 1% 
13% 1% 
—10 0 
0, 1 0 


Uncertain 
58% 4% 
OA% 4% 

—6 0 

—1 


Disagree 


through 30. When the ratings were in- 
spected, it was found that there were wide 
discrepancies among the means of ratings 
made by the various principals. We con- 
sidered that these discrepancies could be due 
to: (a) wide variation in the leniency of the 
raters; or (b) wide differences in the quality 
of teachers in the various school buildings. 
It was necessary to assume one or the other 
in analyzing our data. We chose the former. 
Consequently, the principal ratings were ex- 
pressed as deviations from the mean of the 
particular rater. That is, all the ratings 
which each principal made were averaged and 
each teacher’s score was expressed as a devia- 
tion from that mean. In this way we, in 
effect, equated all schools on the quality being 
rated. This assumption of equality of all 
schools in classroom relations is not neces- 
sarily justified, but it was our opinion that 
less error would result with this technique 
than to assume that all raters (principals) 
were equally lenient in their ratings. It 
would have been desirable to equate the 
variability of each set of ratings in addition 
to equating the means. This was not done 
because several sets of ratings contained only 
three or four cases. 

The third estimate of the classroom rela- 
tions was made by two observers from our 
research team. Each observer visited the 
classroom at different times and observed 
the class in process for thirty minutes to an 
hour. Independent of each other they re- 
corded their observations on a rating scale. 
Items 1 through 5 on the rating scale were 
scored on a 5-point scale for each item, thus 
yielding a possible range in scores from 5 
through 25. Each observer’s ratings were 
converted to standard scores (based on his 
own distribution) and then the two stand- 
ard scores were averaged to arrive at the 
criterion of “mean observer rating.” The 
“mean observer ratings” constituted the third 
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estimate of the criterion of classroom rela- 
tions. 

Each of the three above criteria—student 
ratings, principal ratings (deviation scores), 
and mean observer ratings—were converted 
to standard scores and summed. The sums 
of the three criteria scores were converted 
to standard scores and called the composite 
criterion. This last step was done merely 
to facilitate inspection of our data. 

The sample. The sample for this study 
consisted of 77 public school classes in cen- 
tral Missouri. Grades four through ten in 
four school systems were represented. The 
population of these cities varied from 7,500 
to 26,000. There was only one teacher con- 
tacted in these four school systems who de- 
clined to participate in the study. There 
were 82 classes in the original group from 
these four cities. The sample was reduced 
to 77 due to incomplete data. In these four 
cities all Negro children attended a school 
separate from the schools for the white chil- 
dren. None of the Negro classes was in- 
cluded in this study. The grades included 
varied from city to city, depending upon the 
organization of that particular school sys- 
tem. In grades four through six, the classes 
met as the usual elementary school class. 
In grades seven through ten, the grades were 
organized on a typical high school plan. Of 
the 77 teachers there were 8 male teachers, 
48 married female teachers and 21 single 
female teachers. 

MTAI, Form A,' was administered to each 
teacher in the study. The relationship was 
determined between this predictor and each 
of the three estimates of the criterion, and 
the combined criterion. 


Results 


Table 1 presents the means and standard 
deviations for the predictor and the various 
criteria. The means of the ratings made by 
our two observers are quite similar; how- 
ever, the variance of ratings made by ob- 
server X was significantly greater than for ob- 
server Y (F = 2.13; my = mz = 76; P<.O01). 

1This was the unpublished form of the MTAI. 
The form which was published subsequent to this 


study (1) varies in a few minor details only from 
the one used here. 


Table 1 


Summary Statistics for the Predictor (MTAI) 
and the Various Criteria 
Note: N = 77 classes. 


Variable 


Mean Deviation 
Criteria: 

(1) Observer X Ratings 18.7 

(2) Observer Y Ratings 18.1 

(3) Mean Observer Ratings 49.8t 

(4) Student Ratings 24.4 

(5) Principal (deviation score) 50.4f 

(6) Composite 49.7t 
Predictor: 

MTAI 27.5 





t Based on standard scores computed from a few 
more cases than the 77 teachers in the correlational 
analysis. Values other than those marked (f) are based 
on raw scores. 


The two mean ratings are only slightly greater 
than the middle or average point on the rating 
scale. This agrees with our more subjective 


impression that we were dealing with a typical 
or average group of teachers. 

The ratings made by the principals are in 
sharp contrast with those made by our ob- 


servers. The mean of ratings made by the 
principals was 25.0, while the highest possi- 
ble rating was 30. The principals’ ratings 
would characterize the group of teachers as 
highly superior in their relations with stu- 
dents. 

The mean score on the student inventory 
for all the teachers was 24.4, where the possi- 
ble range of score was + 47 to — 47. There 
are no norms available with which to com- 
pare this value. 

The mean MTAI score of 27.5 is estimated 
to be about average or slightly below average 
for experienced teachers. No directly com- 
parable norm group was available; however, 
norms on somewhat similar groups (beginning 
teachers and graduate students in education 
who had at least two years’ teaching experi- 
ence) suggest the above interpretation. 

There appears to be fair agreement among 
the means of the various measures with the 
exception of the principals’ ratings. 

The intercorrelations among the various 
criterion measures, as shown in Table 2, were 
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Table 2 


Intercorrelation of the Predictor (MTAI) 
and the Various Criteria 
Note: N = 77 classes. 


Stu- Prin- 
dents’ _cipals’ 
Ratings Ratings MTAI 


(1) Observers’ Mean 
Ratings! 

(2) Students’ Ratings 

(3) Principals’ Ratings 
(deviation scores) 19 

(4) Composite of (1), (2), (3) 46** 

(5) Composite of (1), (2) 50** 


- 12 
46** 


40** 
49** 


1 The correlation between the ratings of the two 
observers was .33. 

** Significantly greater than zero at the 1 per cent 
level of confidence. 


quite low with the exception of the correla- 
tion of .46 between principal and student 
ratings. The correlations between the MTAI 
scores and the various criteria, singly and 
combined, were significantly greater than zero 
except for the principals’ rating: students’ 
ratings = .49; mean observers’ ratings = .40; 
principals’ ratings = .19; composite of the 
three criteria = .46; and the composite of 
observers’ and students’ ratings = .50. Thus, 
it appears that with the MTAI we can pre- 
dict the kind of interpersonal relations which 
will exist in the classroom about as well as 
we can predict academic performance by use 
of intelligence tests. Presumably we are 
measuring an aspect of personality which we 


may refer to as “teaching personality.” By 
“teaching personality” we mean those char- 
acteristics of the teacher’s behavior tenden- 
cies which are associated with the teacher’s 
ability to establish harmonius working rela- 
tions with students. 

The results of this study are in general 
similar to the ones conducted by Leeds (2, 
3). The one major discrepancy between these 
similar studies is in the principals’ ratings. 
In Leeds’ studies the MTAI scores correlated 
with the principals’ ratings with coefficients 
of .43 and .46. This is a somewhat higher 
coefficient than that obtained in the present 
study. The correlations of the MTAI scores 
with each of the other estimates of the cri- 
terion, that is, the observers’ ratings and the 
students’ ratings, were rather similar in the 
two studies. The correlation of MTAI scores 
with the composite criterion in this study was 
.46 as compared with .59 and .59 in Leeds’ 
studies. It would appear then that we have 
a good start in finding predictors for our 
criterion of human relations in the classroom. 


Received June 16, 1952. 
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With the increasing use of psychological 
tests by industry as an aid in selecting per- 
sonnel, considerable interest has developed 
in the problem of malingering on such tests. 
This is especially true of the pencil and paper 
type of interest and personality inventory. 
Bordin (1), Longstaff (4), and Strong (7) 
have shown that the Strong Vocational Inter- 
est Blank is fakable. Longstaff (4) also dem- 
onstrated the fakability of the Kuder Prefer- 
ence Record. Meehl and Hathaway (6) have 
found the Minnesota Multiphasic Personality 
Inventory fakable. Tiffin (8) reports a study 
which showed that the Humm-Wadsworth 
Temperament Scale could be faked. Wes- 
man (9) has demonstrated similar results for 
the Bernreuter Personality Inventory. 

At least four different methods have been 
used to try to correct this weakness. One: 
The development of scoring keys to detect 
faking. The “L” scale of the MMPI (2) 
is a good example. Two: Development of 
“suppressor variable” scales to correct scores 
for malingering. Notable in this connection 
is the work of Meehl and Hathaway (6). 
Three: Development of keys based on subtle 
and obvious items (10). Four: Development 
of tests using the “forced-choice technique” 
which it was hoped would be less susceptible 
to faking. An example of this approach and 
the subject of this paper is the Jurgensen 
Classification Inventory (3). The essential 
feature of such inventories is to force the 
subject to choose what he considers the best 
and worst items from a group of items which 
represent only “good” or only “bad” traits. 
It was hoped that this would get away from 
the weakness of presenting lists of inter- 
mixed “good” and “bad” traits where a ma- 
lingerer could state that he had only “good” 
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traits and did not possess any of the “bad” 
ones. 

Mais (5) developed and cross-validated 
a “self confidence” key for the Jurgensen 
Classification Inventory. He then had col- 
lege students take the test honestly and dis- 
honestly, i.e., trying to fake a high score in 
self-confidence. He found the mean score for 
his group changed from — 5.9 (honest) to 
6.9 (faked). This difference of 12.8 was 
significant at the .01 level. The Pearsonian 
correlation between the two scores was only 
17, 

Mais’ study gave adverse data on the 
Jurgensen Classification Inventory. How- 
ever, it was not a crucial study. The Classi- 
fication Inventory was developed for use in 
personnel selection. In such industrial use, 
applicants do not know what “traits” are 
being measured. In fact, most keys are not 
based on traits but on over-all job success. 
It may be one thing to raise a score on a 
specified and named trait such as self-con- 
fidence and another thing to obtain a higher 
score on undefined job success. 

The present study was designed to investi- 
gate further the fakability of the Classifica- 
tion Inventory. Two groups of University 
of Minnesota students in personnel psychol- 
ogy courses served as subjects. Group A con- 
sisted of 41 juniors, seniors and graduate 
students, the majority of whom had com- 
pleted numerous courses in psychology and 
industrial relations. Group B consisted of 
37 extension division students in an evening 
class, and represented a less highly selected 
group than the first. 


Method 


Each student took the Classification In- 
ventory under three sets of conditions: (1) 
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honest, (2) fake good over-all, and (3) fake 
high self-confidence. Directions for the test 
under these three sets of conditions were as 
follows: 


1. Honest score. “This test has been con- 
structed quite differently from most personnel 
tests. It has been tried out in industry and has 
been phenomenally successful in certain instances. 
Since you are students of personnel psychology, 
my purpose in giving you the test is threefold: 
first, I want you to become acquainted with it by 
actually taking it; second, your standing in the 
test may be of assistance to you in planning your 
vocational future; third, we hope to build a key 
that will assist us in directing students toward or 
away from personnel work as a vocation. Please 
answer the questions as accurately as you can as 
they apply to yourself. It will be obvious from 
the questions that there are no right or wrong 
answers. It is wholly a matter of personal pref- 
erence on your part; therefore, answer the ques- 
tions as they apply to you.” 

2. Fake over-all good score. “Last time, you 
took the Classification Inventory under conditions 
in which you were instructed to answer the ques- 


tions as they apply to you. In taking the test 
this time imagine yourself in an employment de- 
partment of a large and prosperous company. 
You have finished your education and are now 
starting out upon your life’s work. You want 
very much to get a job with this company and 
hope to spend the rest of your life working for 
them. Therefore, you want to make as good an 
impression as you can. Answer the questions so 
you will appear in the most favorable light to 
the personnel manager.” 

3. Fake high self-confidence score. “You have 
taken this test twice before. Today I would like 
to have you take it trying to fake your answers 
so as to make a high score in self-confidence.” 

Means and sigmas of scores obtained under 
the three conditions are given in Tables | 
and 2. Results support Mais (5). Students 
significantly increased their scores in self- 
confidence when they attempted to do so, the 
increase averaging approximately one sigma. 
This increase is both statistically and prac- 
tically significant. Statistical significance 
(t = 8.75) is beyond the .01 level. 


Table 1 


Mean Scores on Self-Confidence Key Under Three Sets of Conditions 


Group A 
(N = 41 University students 


Group B 
(N = 37 Extension students 


Total Group 
(N = 78 


Honest or 
Accurate 


Fake 
Over-all 
Good Score 


Fake 
High Selif 


Score Confidence 


Table 2 


Variability of Scores (Sigmas) on Self-Confidence Key Under Three Sets of Conditions 


Group A 
(N = 41 University students) 


Group B 
(N = 37 Extension students) 


Total Group 
(N = 78) 


Honest or 
Accurate 


Fake 
Over-all 
Good Score 


Fake 
High Self- 


Score Confidence 
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The situation is different when we compare 
“honest” scores with attempts to fake “over- 
all good” scores. The increase is neither 
statistically nor practically significant. Obvi- 
ously, these students were unable to increase 
their scores in self-confidence when attempt- 
ing to appear in the most favorable light. 
However, the similarity of mean scores does 
not give the whole story; and the test is not 
as satisfactory for employment use as might 
appear. The Pearsonian correlation between 
“honest” and “fake over-all good” scores was 
only .28. Obviously, many of the students 
had attempted to increase their score. Al- 
though scores in general were not raised, they 
were changed. This, of course, would be a 
serious defect if the test were being used for 
selection. 

Consideration of the foregoing findings 
raises an important question. Did students 
change their answers because they thought 
they could improve their scores or because 
they were instructed to fake? Essentially, 
they were directed to change answers and 
perhaps we should not be too surprised when 
they follow instructions. 

To investigate this question, another group 
of 68 students comparable to the previous 
Group A was given the Classification Inven- 
tory with instructions which avoided direct 
orders to fake answers and which simulated 
more nearly industrial selection and voca- 
tional guidance conditions. Directions for 
the test under these two conditions were as 
follows: 


1. Industrial selection. “In taking this test 
make the following assumptions. You have just 
finished your college work and are in the em- 
ployment department of the organization you 
hope to work for, applying for a job. This job 
you are applying for is exactly the kind of job 
you want so it is very important to you that you 
get it. The personnel manager informs you that 
the company has a battery of tests they give all 
their applicants and says, ‘This is the first test in 
the battery. It is called the Classification In- 
ventory. You will please read the directions and 
then answer the questions.’ ” 

2. Vocational guidance. “At the last meeting 
of the class you took the Classification Inventory 
assuming you were applying for a job. Today I 
would like to have you take the test again, mak- 
ing the following assumptions: You are having a 
great deal of trouble trying to decide what voca- 


tion you should go into. You finally decide to 
go to The Student Counseling Bureau to see if 
they can give you any assistance. The counselor 
informs you, ‘We have a battery of tests we 
should like to have you take. We have found 
the results very helpful in dealing with problems 
like your own. The first test in the battery is 
called the Classification Inventory. Will you 
please read the directions and then answer the 
questions.’ ” 


Again it was found that mean scores were 
essentially the same. The mean for the “In- 
dustrial” situation was — 1.28 (sigma of 
8.98) and that for the “Vocational Guid- 
ance”’ situations was — 2.18 (sigma of 9.28). 
This difference is not statistically significant 
(t = 81). 

The correlation between “Industrial” and 
“Guidance” scores is .50. This is a substan- 
tial increase from the former coefficient of 
.28. It is apparent that the degree of faking 
is materially reduced by avoiding the direct 
suggestion to change answers. It should be 
pointed out that we are dealing in these ex- 
periments with a very intelligent and psy- 
chologically sophisticated group of subjects. 
That this is an important factor can be seen 
by comparing the results of Group B (Table 
1 and 2) with the other groups. The ex- 
tension students (Group B) increased their 
scores considerably less than did the other 
groups comprised of more highly selected 
students. Be this as it may, the fact clearly 
stands out that all three groups materially 
changed their answers and scores under the 
different sets of conditions. Although modi- 
fication of directions toward greater realism 
decreased the extent of change, the resultant 
correlation of .50 is not encouraging. Obvi- 
ously faking is possible in the Classification 
Inventory, and probably occurs when the in- 
strument is used for employee selection pur- 
poses. Unfortunately, the extent of such 
faking cannot be determined for any single 
applicant. 

Although various forced-choice tests differ 
in the way in which items are selected, these 
differences would not appear to be related to 
attempts to fake. Presumably, findings from 
these experiments might well be expected 
to apply to the forced-choice technique in 
general. 
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Summary 


1. Scores on self-confidence were signifi- 
cantly raised when students attempted to 
raise their scores and knew the test meas- 
ured self-confidence. 

2. Scores on self-confidence were not in- 
creased when students attempted to fake 
good over-all scores when students did not 
know that the test was scored for self-con- 
fidence. 

3. Scores on self-confidence were not in- 
cteased when students changed from a simu- 
lated “industrial” to a simulated “guidance” 
frame of reference when the students did not 
know that the test measured self-confidence. 

4. The way in which instructions were 
worded materially affected the extent of at- 
tempted faking. 

5. Although mean scores were not increased 
when students did not know what trait was 
being measured, individual scores were fre- 
quently changed to a considerable extent. 
This was evidenced by correlation coefficients 
far lower than reliability coefficients. So far 
as score interpretation is concerned, the at- 
tempt to improve scores is probably as im- 
portant as the ability to improve scores. 
How should a score at the fiftieth percentile 
be interpreted? Does it reflect an average 
amount of the trait being measured? Is it 
the result of a successful attempt to raise a 
low score? Or is it the result of an unsuc- 
cessful attempt to further increase what is 
already a high score? The answer is un- 
likely to be known in any single case. 

6. The Classification Inventory is not 
recommended for use in situations where per- 
sons are likely to be motivated to obtain 
good scores. 


7. Although these data were obtained on 
the Classification Inventory, this is no reason 
to believe that different results would be ob- 
tained from any other forced-choice per- 
sonality test. 

8. It is the opinion of the authors that 
techniques other than the forced-choice tech- 
nique will have to be devised if the problem 
of malingering on personality tests is to be 
overcome. 
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The Relationship Between the Judged Desirability of a Trait and 
the Probability That the Trait Will Be Endorsed * 


Allen L. Edwards 


The University of Washington 


There is a rather common suspicion among 
many psychologists that subjects tend to give 
what are considered to be socially desirable 
responses to items in personality inventories. 
This suspicion has been given public ex- 
pression in a recent article by Gordon (3, 
p. 407) who comments upon “. . . the mo- 
tivation of a majority of respondents to 
mark socially acceptable alternatives to items, 
rather than those which they believe apply 
to themselves.” 

We have here two problems. One con- 
cerns the truthfulness of a subject’s answers 
to items in a personality inventory, ie., 
whether the response accurately describes the 
subject. The answer to this question im- 
plies that we have available some independent 
criterion in terms of which the inventory 
response is to be evaluated. The other prob- 
lem concerns the relationship between a sub- 
ject’s response to an item and the social 
desirability of that item, ie., whether the 
subject tends to give a positive answer to 
an item that is socially desirable and a nega- 
tive answer to an item that is not. The 
answer to this question implies that we have 
available some measure of the social desira- 
bility of the item to which the response can 
be related. It is this problem we wish to 
report upon here. 


The Present Study 


The hypothesis to be investigated may be 
stated in this way: If the behavior indicated 
by an inventory item is socially desirable, 
the subject will tend to attribute it to him- 
self; if it is undesirable, he will not. This 
hypothesis may be put more precisely: The 
probability of endorsement of personality 


*This paper was presented before the Western 
Psychological Association, Fresno, California, April 
26, 1952. It is part of a research program made 
possible by an appointment as a Faculty Research 
Fellow of the Social Science Research Council. 


items is a monotonic increasing function of 
the scaled social desirability of the items. 
To study the relationship between the 
probability of endorsement of personality 
trait items and the social desirability of the 
items requires that we determine independ- 
ently two measures: the probability of en- 
dorsement and the social desirability scale 
value of the items. This study thus consists 
of two parts: in the first, the scale values of 
the items are determined; in the second, the 
probability of endorsement is related to the 
independently determined scale values. 


Determining the Scale Values 


A total of 140 personality trait items, 
based upon Murray’s (4) discussion of needs, 
were written and edited. The items were 
selected so that 14 needs were investigated 
with 10 items supposedly indicative of each 
need. The items were arranged in 10 sets 
of 14 items each, so that each set consisted 
of one item relating to each of the needs. 

The items were presented to subjects with 
instructions to judge the degree of social de- 
sirability of the behavior indicated by each 
item in terms of how the behavior would be 
regarded in others. Judgments were made 
in terms of nine successive intervals, with the 
lowest interval representing extreme unde- 
sirability and the highest extreme desirability. 
The rating system was explained in terms of 
a sample set of four items for which judg- 
ments had already been obtained. After 
these ratings had been discussed, the in- 
structions to the subjects concluded with the 
following statement: 

“Indicate your own judgments of the de- 
sirability or undesirability of the traits which 
will be given to you by the examiner in the 
same manner. Remember that you are to 
judge the traits in terms of whether you con- 
sider them desirable or undesirable in others. 
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Be sure to make a judgment about each 
trait.” 

The subjects judging the desirability of the 
items consisted of 86 men and 66 women, a 
total of 152 subjects. Twenty-six of the 
subjects were under 20 years of age, 97 were 
between 20 and 30 years of age, and 29 were 
over 30 years of age. 

Cumulative distributions of the judgments 
were made separately by age and by sex 
groups. For each item we then found the 
interval in which the median of the distribu- 
tion of judgments would fall. 

In Figure 1, we show the plot of the 
women’s intervals against the corresponding 
values for the men. It may be noted that 
in the case of only two items would the 
medians be separated by as much as two 
intervals. For 43 of the items the medians 
might possibly be separated by as much as 
one interval. For the remaining 95 items 
the medians would all fall within the same 
interval. 

In the case of many of the items falling 
outside the principal diagonal of Figure 1, 
the medians would still be approximately the 
same for the reason that the medians of both 
distributions are close to the limit of the 
interval, but one happens to fall slightly 
above and the other slightly below the limit. 

A similar analysis of the judgments was 
made in terms of the age variable. Exami- 
nation of the separate distributions indicated 
that the scale values that would thus be ob- 
tained would be comparable and that little 
distortion would be introduced by pooling 
the judgments for all groups. 

On the basis of the combined distribu- 
tions, the scale values of the 140 items were 
found. The scale values were determined by 
the method of successive intervals (1). This 
method of scaling does not involve any as- 
sumption of equality of the successive rating 
intervals. 

After determining the widths of the suc- 
cessive intervals and the scale values of the 
items on the psychological continuum of so- 
cial desirability, an internal consistency test 
was applied (1). Using the 147 parameters 
calculated from the data, it was possible to 
reproduce the 1,120 independent, empirical 


Fic. 1. Interval in which the median of the wom- 
en’s distribution of judgments would fall plotted 
against the interval in which the median of the men’s 
distribution of judgments would fall. 


observations with an average error of .023. 
This value, it may be mentioned, compares 
favorably with that usually obtained from 
internal consistency tests used when stimuli 
are scaled by the method of paired com- 
parisons. 


Relationship Between Scale Values and 
Probability of Endorsement 


In the second part of this study, a sample 
of 140 pre-medical and pre-dental students 
responded to the same set of items for which 
we had previously determined the scale values 
on the psychological continuum of social de- 
sirability. This time, however, the items ap- 
peared in a printed form as a personality in- 
ventory. The inventory was part of a test 
battery which was administered for the Medi- 
cal and Dental Schools of the University 
of Washington. The instructions were those 
that are commonly used with personality in- 
ventories. A “Yes” response indicated that 
the subject believed that a given item was 
characteristic of himself and a “No” response 
that it was not. 

Item counts were made for each item, by 
means of IBM equipment, and the per cent 
responding “Yes” was then found for each 
item. This per cent is the proportion of the 
sample indicating that the behavior stated 
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Probability of endorsement of a trait item plotted against the social desirability scale value of 


the item. The product-moment correlation coefficient is .871. 


by a particular item is characteristic of them- 


selves. The proportions may be taken as the 
probability of endorsement of a particular 
trait item for the sample at hand. 

The probability of endorsement of each 
item was plotted against the previously, and 
independently, determined social desirability 
scale value of the item. This plot is shown 


in Figure 2. On the Y-axis we have the 
probability of endorsement and on the X-axis 
the social desirability scale value. It is ap- 
parent that the probability of endorsement 
is a linear function of the scaled desirability 
of the item. The product-moment corre- 
lation coefficient is .871. 


Discussion 


The data clearly indicate that the prob- 
ability of endorsement of an item increases 
with the judged desirability of the item. 


There is a slight indication of departure from 
linearity at the two extremes of the scale value axis. 
This is probably because of the limit placed upon 
the plotted points in terms of the Y-axis. The de- 
parture from linearity, however, is not statistically 
significant. 


This does not necessarily mean that the sub- 
jects are misrepresenting themselves on the 
inventory. It may be that traits which are 
judged as desirable are those which are fairly 
widespread or common among members of 
a culture or group. That is, if a pattern of 
behavior is prevalent among members of a 
group, it will be judged as desirable; if it is 
uncommon, it will be judged as undesirable. 
We might thus expect items indicating de- 
sirable traits to be endorsed more frequently 
than items indicating undesirable traits. 

It is also possible that the behavior in- 
dicated by an item with a high social de- 
sirability scale value is not common, but 
that the subject taking the inventory is try- 
ing, consciously or unconsciously, to give a 
good impression of himself. He therefore 
tends to distort his answers in such a way as 
to make himself out as having more of the 
socially desirable traits and fewer of the so- 
cially undesirable traits than might be the 
case if his behavior were evaluated in terms 
of some other independent criterion. 
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Either one or both of the interpretations 
presented would account for the relation- 
ship between probability of endorsement and 
scaled desirability of the item. I have no 
data to support the interpretation that the 
subjects misrepresented themselves on the 
inventory, but Ellis (2) in his recent review 
cites quite a few studies which would in- 
dicate that this is the case. 

If this is true, then in a personality inven- 
tory we should attempt to minimize the tend- 
ency for a given response to be determined 
primarily by the factor of social desirability. 
A suggested solution is to pair items indica- 
tive of different traits in terms of their social 
desirability scale values. If the subject is 


then forced to choose between the two items, 
his choice obviously cannot be upon the basis 
of the greater social desirability of one of 
the items. 


Received June 3, 1952. 
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A Note on “Interest Item Response Arrangement” 


John V. Zuckerman 


Human Resources Research Office, The George Washington University 


In a private communication, Cronbach has 
called my attention to some aspects of my re- 
cent article (5) which should be clarified. 
Some methodological problems were not suffi- 
ciently explained in the original article, a 
basic assumption was left unstated, and in 
addition some violence was done in citing 
Cronbach’s position with respect to the effect 
of item response arrangement on measure- 
ment of traits or qualities. The points to be 
considered may be examined topic by topic. 


Reliability 


In a comparison of two interest test forms, 
FE (with 168 two-choice items) and OE (con- 
taining 112 L-I-D items), four product-mo- 
ment reliabilities (corrected for test length by 
the Spearman-Brown formula) were computed 
for four empirical keys, using odd-even tech- 
nique. With one exception, the reliabilities 
were similar for the two forms. For the key 
which was discrepant, OE had a higher re- 
liability. Odd-even reliabilities cannot be in- 
terpreted as estimates of test-retest reliabili- 
ties, particularly because, for L-I-D or similar 
scales, odd-even figures would be raised in 
the event that a transient response set were 
affecting performance throughout the test. 
Evidence of such response set might be found 
in the number and direction of weights for 
responses to the different categories. One 
may conclude that the split-half correlation is 
not the appropriate reliability measure for an 
empirically keyed scale, since a test might 
have low inter-item consistency but a high 
test-retest reliability. A low split-half cor- 
relation would obscure the high real relia- 
bility. 

Retabulation of my data in terms of num- 
bers of positions weighted (see Table 1) 
shows that there are consistent tendencies for 
educators to like more things than engineers, 
teachers to dislike more things than adminis- 
trators, administrators to like more things 


than teachers, and teachers to dislike more 
things than educators in general. 

These tendencies were capitalized on by the 
empirical scoring keys used in the study. To 
determine whether reliabilities of those L-I-D 
keys are lower than for forced-choice keys, 
the study would require the addition of test- 
retest reliability information which is not 
presently available. 


Validity 

My study was intended to compare the rela- 
tive discrimination provided by forced-choice 
and L-I-D item forms. The experimental de- 
sign involved the assessment of relative dis- 
crimination of four scales by rescoring blanks 
of most of the original subjects. The as- 
sumption was made that any shrinkage for 
OE scales would be the same for FE scales 
upon a cross-validation. Cronbach points out 
that when L-I-D or similar three-choice items 
are assigned weights, there are more possibili- 
ties that weights could arise out of chance 
differences than where forced-choice pairs are 
used. He states further that the more chance 
discriminations are counted in the score, the 
more the validity will shrink on a fresh sam- 
ple, based on mathematical considerations. 


Table 1 
Retabulation of Zuckerman’s Data (5) in Terms 


of Numbers of Positions Weighted 


of Positions Weighted* 
Scale Items — 
Name Weighted L+ L— 





ED-ENG 94 66 16 
ADM 38 11 9 
TEA 41 1 18 
AD-TEA 49 a: «66 





* Positive direction of weights in favor of educators 
for ED-ENG, for administrators and teachers for the 
ADM and TEA scales, and for administrators in the 


AD-TEA scale. See original article for explanation of 
scale construction (5). 
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Probable shrinkage, according to Cronbach, 
depends on factorial complexity of the item 
matrix, the number of items (or weights) ; the 
number of subjects tested, and the criterion 
reliability. No data are available from my 
study which bear on the problem of differ- 
ential shrinkage for different item forms. To 
settle this point, a follow-up study will have 
to be made in which cross-validation pro- 
cedures are used. 

In a study intended to achieve the same 
aims as mine, but with a different method- 
ology and subject matter, Gordon (4) found 
that forced-choice personality questions pro- 
vided more discrimination than open-ended 
rating scale statements. Differences in cri- 
teria, measuring instruments, and methods 
prevent direct comparison of the studies, how- 
ever. 


Response Set 


While the general tone of Cronbach’s 
earlier articles on response set (1, 2) was 
unfavorable toward the use of item forms 
such as L-I-D, he did clearly raise the possi- 
bility that it is desirable to capitalize on re- 
sponse-set variance (especially 2, pp. 17, 27, 
and 28). My statements regarding his po- 
sition (5, pp. 79 and 84) were in error. A 


recent study by Cronbach and another author 
(3) expresses Cronbach’s current appraisal of 
the problem in terms of a mathematical con- 
sideration of profile analysis. Response-set 
may appear as a mathematical factor entitled 
elevation. The investigator is advised to con- 
sider the meaning, if any, of the factor, and 
determine whether it is to be included in his 
scoring procedure. There is no basic dis- 
agreement between Cronbach and myself on 
this point. 


Received January 21, 1953. 
Published out-of-turn by the editor. 
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Effects of the Nature of the Problem on LGD Performance * 


Bernard M. Bass and Cecil R. Wurster 
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The basic scheme of group situational tests 
is to place examinees as a group in a prob- 
lem or work situation. Examiners observe, 
record, or rate examinees’ behavior as mem- 
bers of the group. The hypothesis under- 
lying the method is that the situational test 
is a valid sample of behavior for predicting 
future behavior in a real group situation. It 
has been verified by a number of studies 
(e.g. 2, 4, 11). 

Because of the wide variety of possible 
group situations, a large number of variations 
in group situational tests have been tried. 
Candidates for positions of leadership have 
been assessed: (a) in initially leaderless situa- 
tions (e.g. 4); (b) in situations where each 
candidate, in turn, has been appointed leader 
(e.g. 2); (c) in situations where a staff mem- 
ber has served as leader (e.g. 1); and (d) 
in situations where the leader has been elected 
by the group (7). Arbous and Maree have 
reported a median correlation of .67 between 
assessments based on observations of the 
same candidates in situations a and b. 

A problem for solution may or may not 
have been presented. Some studies have 
given participants a choice of problems to 
discuss (e.g. 11); others (e.g. 9) have al- 
lowed the group to originate the problem; 
while still others have assigned the problem 
(e.g. 4). 

Various kinds of problems have been pre- 
sented. These have included general inter- 
est problems such as “Select the ten outstand- 
ing leaders in the world today” (10); more 
specific problems such as: “Develop a pro- 
gram to train supervisors in this plant” (3); 
as well as case histories of human relations 
problems in which the group is asked to de- 
cide what the best course of action will be (8). 


* This study was aided by a grant from the Louisi- 
ana State University Graduate Council on Research. 
The writers wish to express their appreciation to Mr. 
Ernest McNeil, Mr. Jamie Dennis and the many 
others whose help made this study possible. 


The purpose of the present study was two- 
fold. The first aim was to see the extent to 
which a person’s successful leadership activity 
in an initially leaderless discussion changed 
when there was a systematic change in the 
nature of the problem and the persons with 
whom he was grouped. The second purpose 
was to see whether assessments based on 
some types of discussion situations were more 
related than others to various measures of 
company rank, education, intelligence, super- 
visory aptitude, age and appraisals of super- 
visory behavior on-the-job. 


Subjects and Method 


The subjects were a class of 23 students in 
an introductory psychology course and 131 
oil refinery supervisors. The 23 students 
were divided purposefully into three groups 
and each observed in a half-hour LGD with 
one of three types of conditions: (a) un- 
structured—participants originated problem 
for discussion; (b) general leader specifica- 
tions—e.g. participants developed a set of 
factors for choosing the world’s greatest lead- 
ers; (c) case history—e.g. participants de- 
cided whether a returning veteran should 
tell his wife about an illegitimate child he 
fathered overseas. 

Then, three new groups were formed so 
that as few members of the same first three 
groups were together for a second time and 
so that all participants could be assessed 
under a condition different from the situa- 
tion in which they were first tested. Finally, 
a third recombination was carried out so 
that all 23 participtants were observed under 
each of the three conditions. Conditions b 
and c were altered slightly on each successive 
administration to avoid having participants 
specifically prepared. Thus, different kinds 
of specifications were demanded and differ- 
ent case histories were used in the successive 
administrations. LGD scores were based on 
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Table 1 


Correlations Among LGD Scores Earned by Participants Subjected to 
Three Different Types of Discussion* 


Unstruc 
tured 


Type of LGD 


Unstructured 
Leader Specification 
Case History 


*N = 23 


one observer's ratings of the extent to which 
each participant exhibited successful leader- 
ship activity in a given discussion.’ 

The 131 supervisors were assessed under 
one of four conditions: (a) unstructured; (b) 
general leader specifications; (c) in-plant 
leader specifications; (d) case history. Situa- 
tion c concerned the specifications for select- 
ing shift foreman, supervisors and so forth. 
The case history concerned such problems 
as what Mike should do when his superior 
bawls him out in front of his subordinates 
or what Harry’s superior should do when he 
finds various faults with Harry’s method of 
leading a work gang. Five unstructured and 
four of each of the other types of situations 
comprised the 17 group discussions. LGD 
scores were corrected for group size and varia- 
tions among observer’s standards.” 


Results 


Table 1 displays the intercorrelations 
among LGD scores obtained by the 23 par- 
ticipants on the basis of each of the three 
types of discussions. 

Since an LGD test-retest reliability of .75 
was reported for repeated discussions a week 
apart with changed participants but with no 
change of problem (8); and since it has been 
found even lower (ry = .53) when a year in- 
tervenes between repeated measurements and 
the outside status of some participants is 
changed more than others (5), it was in- 
ferred from the results reported in Table 1 
that some, but not very much, variation in 

‘For a more detailed description of the scoring 


procedure, the reader is referred to (6). 
“See footnote 1 


Type of LGD 


Case 
History 


Leader 


Specifications Average 


58 06 62 
51 54 
58 


LGD behavior could be attributed to varia- 
tions in the nature of the stimulating situa- 
tion. 

The variation from .51 to .66 in intercorre- 
lation and from .54 to .62 in average inter- 
correlation are most probably due to chance. 
Since the usual validities of these various 
types of LGD’s range from .30 to .50, these 
intercorrelations suggest that to include more 
than one in a battery would not raise the 
validity very much of any two over the va- 
lidity of any one, although the reliability of 
the composite might be raised substantially. 

Table 2 indicates the correlations between 
two independent clusters of highly interrelated 
variables and LGD scores earned by the re- 
finery supervisors in one of four types of 
discussions. (The clusters were isolated by 
inspection of an_ intercorrelation matrix. 
Cluster I consisted of supervisor’s rank in 
the company, education, intelligence, super- 
visory aptitude and youth. Cluster IT con- 
sisted of superiors on-the-job appraisals of 
the supervisors by means of graphic and 
forced-choice rating scales (6).) Super- 
visors were classified into a lower and upper 
echelon of management. Correlations be- 
tween rank and LGD scores were biserial; 
the remaining were Pearson product-moment. 

Chi square tests of the significance of the 
variations in correlations from one discussion 
type to the next * suggested that only one set 
of correlations—those between company rank 
and LGD scores—-varied significantly at the 
1 per cent level of confidence. It was in- 

* This test is described by Edwards, A. L. Experi 


mental design in psychological research. New York: 
Rinehart, 1950. Pp. 133-135. 
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Table 2 


The Correlation Between LGD Scores of Oil Refinery Supervisors Subjected to One of Four Types of 
Discussion Situations and Their Rank, Education, Intelligence, Supervisory 
Aptitude, Age and Superior’s Appraisals 


Out-Plant 


Unstruc- 
tured 
(a) 

No. of Groups : 
No. of Subjects 35* 
Cluster I 
Rankt 
Education 
Intelligence 
Supervisory Aptitude 
Youth 
Cluster II 
Graphic Appraisal 
Forced Choice Appraisal! 


87 


50 
34 
30 


02 
02 


Specifications 





Sub-Samples According to LGD Type 


In-Plant 
Leader 
Specifications 
(c) 


Case 
History 


(d) 


Leader 


(b) 


Total 


4 4 4 
ae a kg 


17 


Bl 
A7 
61 
.28 


O1 
A 
Al 
07 
31 


99 
57 
34 
54 
24 


— 11 
- 12 


— 04 
28 


— O01 
12 


* Because of missing information on intelligence, supervisory aptitude scores and superior’s appraisals many 
of the sub-sample correlations of LGD scores with these variables are based on as few as 21 cases. 


¢ Sub-sample variation in correlation with LGD significant at the 1 per cent level of confidence. 
The others are Pearson correlations. 


computed by means of biserial r. 


ferred from the correlation of .99 between 
rank and case history LGD scores that upper 
echelon supervisors were the sole leaders in 
such discussions; their tendency to exert lead- 
ership in the supposedly leaderless situation 
declined somewhat when discussions involved 
situations outside the company as in the out- 
plant leader specifications and the unstruc- 
tured discussions. (In the latter, the problem 
originated by the participants for discussion 
quite often concerned improving the town 
sewerage system, increasing civic pride, and 
so forth.) The one hypothesis worthy of 
further investigation drawn from these re- 
sults, therefore, was that a supervisor of high 
rank is most likely to play the role of leader 
among persons of lower appointed rank when 
the group problem specifically concerns situa- 
tions for which he has the high rank. 

In evaluating these results, the reader 
should note that as reported previously (6), 
there were very large restrictions in the range 
of most of the variables—especially, super- 
visory aptitude scores and superior’s ap- 
praisals—because a large percentage of the 
examinees were selected for their present 


This set 


posts because of their high supervisory apti- 
tude test battery scores. 

The correlation of .54 between “case his- 
tory” LGD scores and supervisory aptitude 
suggested a valid consistency between assess- 
ments based on the case history LGD and the 
supervisory aptitude battery—a battery which 
gave substantial weight to paper-and-pencil 
tests of supervisory judgment. When the 
masking influence of rank, the lower relia- 
bility and validity of the graphic in compari- 
son to the forced choice appraisal and the 
great restriction in range of the appraisals 
were all taken into account, it was inferred 
from the correlation of .28 between case his- 
tory LGD scores and forced choice appraisals 
that the case history LGD is the most likely 
type of those investigated to provide a valid 
predictor of adequacy on-the-job, where the 
examinees are of different known organiza- 
tional rank, and previously have been selected 
by means of valid paper-and-pencil test 
batteries. 

Summary 


The purposes of the present study were to 
see the effects on their behavior of changing 





Effects of the Nature of the Problem on LGD Performance 


the nature of the problem confronting LGD 
participants. 

LGD scores of 23 college students cor- 
related between .51 and .66 with repeated 
administrations where the composition of the 
group and the problem for discussion were 
systematically altered. These correlations 
were not much lower than the test-retest 
reliability (r =.75) of one type of LGD. 

The extent to which various personal fac- 
tors were associated with LGD performance 
of 131 oil refinery supervisors depended to 
some extent on the nature of the problem 
under discussion. Major findings were: 

1. A high-ranking supervisor is more likely 
to exert leadership in small discussion groups 
with supervisors of lower rank when the dis- 
cussion specifically concerns situations for 
which he has the high rank. 

2. The amount of successful leader activity 
in discussions of case histories of human rela- 
tions problems appears related to paper-and- 
pencil predictors of supervisory success (r 

54) and to a lesser extent with forced 
choice on-the-job appraisals of supervisory 
success (r = .28). 


Received June 2, 1952. 
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Mandell (4), among others, has hypothe- 
sized that candidates for employment or pro- 
motion who are assessed in a leaderless group 
discussion or group oral performance test 
should be unacquainted with each other; 
otherwise “they may defer to a candidate who 
has high prestige in the group, or who has a 
higher-level position.” The primary purpose 
of this study was to investigate the extent to 
which a person’s performance in the LGD 
was influenced by his administrative rank 
outside the immediate stimulating situation. 

A number of sub-hypotheses were tested 
and a number of relationships were uncovered 
concerning the interactions between company 
rank, degree of successful leader activity in 
the LGD, rated performance by superiors as 
a supervisor, age, education, intelligence and 
knowledge and attitudes predictive of success 
in supervisory work. 

It was believed that the results would be 
of interest to those engaged in using the 
LGD to screen applicants for employment 
or promotion. They would also provide 
further information to a growing body of 
knowledge concerning leader-follower  rela- 
tions in small groups. 


Subjects 


A total of 131 supervisors at a large oil 
refinery participated in leaderless group dis- 
cussions. Of these, 61 were first level main- 
tenance department supervisors; 22 were first 
level process and production supervisors; 18 
were second level and 7 were third and fourth 
level supervisors in production, maintenance, 
or staff positions. In addition, 20 were engi- 
neers, accountants or other technicians who 
had no supervisory positions while three had 


* This study was aided by a grant from the Louisi- 
ana State University Graduate Council on Research. 
The writers wish to express their appreciation to Mr. 
Ernest McNeil, Mr. Jamie Dennis, and the many 
others whose help made this study possible. 


highly responsible technical positions which 
called for little direct supervision. The sub- 
jects ranged in age from under 30 to over 60 
and from sixth grade to Ph.D. in education. 
The average subject was 43 years old and a 
high school graduate. 

One restriction which most probably served 
to severely attenuate the various relationships 
studied was caused by the high percentage 
of subjects who had been selected for their 
jobs by a previously-validated battery of 
psychological tests. A further factor which 
probably served to restrict the range of ob- 
servable differences was the large amount of 
supervisory training these subjects had _ re- 
ceived from various formal and informal 
programs. 


Method 


Approximately 20 supervisors at a time 
met for a week-long supervisory training pro- 
gram. On the fourth day, they were sub- 
divided into two or three groups, 6, 7, 8, 9, 
or 10 to a group, and administered one of 
four types of leaderless discussions.' A total 
of 17 discussions was run, each observed by 
one of four trained raters. Directions and 
scoring * were similar to previous studies at 
Louisiana State University (e.g. 2). 


Two types of criteria of on-the-job success as 
supervisors were available: forced-choice and 


‘A separate report will deal with variations in per- 
formance on the LGD as a function of the nature of 
the discussion problem. 

* The single observer rated on a 5-point scale the 
extent to which each participant exhibited the fol- 
lowing 7 behaviors: (1) showed initiative; (2) spoke 
effectively ; (3) clearly defined problem; (4) offered 
good solutions; (5) influenced others; (6) motivated 
others; (7) led the discussion. A participant’s LGD 
score was the sum of points he received, corrected 
for group size, and observer variations in points as- 
signed. Scores were adjusted according to the mean 
score earned by participants of groups of a given 
size. The distribution of scores assigned by each ob- 
server was transformed into a sten distribution (3) 
in order to make fairly comparable all adjusted 
scores assigned by the different observers. 


100 
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graphic appraisals by the subjects’ superiors. 
Forced-choice supervisory performance report 
ratings for 1950 by at least two of their su 
periors were obtained from the records of 123 of 
the subjects. These ratings, developed by Rich- 
ardson, Bellows and Henry, Inc., had odd-even 
and equivalent-form reliabilities above .90 for the 
groups on which they were standardized. Inter- 
rater reliability was .69. Validity of the ratings 
as measured by their tendency to differentiate 
previously identified, above average, average and 
below average supervisors ranged from .62 to .84 
for the various forms and departments of the re- 
finery. However, for the present restricted sam- 
ple, inter-rater agreement was only .43. Since 
most subjects’ average appraisals were based on 
independent ratings by as many as six superiors, 
the actual reliability of this measure was ap- 
preciably higher. 

Corresponding graphic ratings were likewise 
available for these 123 subjects. For this sam- 
ple, inter-rater correlation was only .29. Otis 
Intelligence Test scores for 87 subjects and “‘su- 
pervisory aptitude” test scores for 92 subjects 
were also available. The supervisory battery 
test scores were an optimally weighted sum of 
scores of performance on a forced-choice test 
of supervisory judgment, an empirically scored 
forced-choice personality inventory, and scores on 
certain keys of the Kuder Preference Record 
For the original standardizing group, the opti- 
mally weighted battery of scores correlated .62 
with superiors’ ratings of the subjects. 


Results 


Table 1 displays the matrix of intercorrela- 
tions among LGD scores, company rank, edu- 
cation, intelligence, supervisory aptitude, 
youth,® forced-choice appraisals and graphic 
superior’s appraisals. The first six have been 
grouped into one cluster of highly intercorre- 
lated variables while the last two form a 
second cluster. The average correlation be- 
tween each variable and all the others of 
cluster I and cluster II are also shown. Ali 
correlations reported are Pearson product- 
moment except those between rank and the 
other variables which are biserial.* 

Criterion ratings, supervisory battery scores 
and intelligence test scores were available in 

‘Age reversed in sign to make positive most of 
the correlations of age with the variables of cluster I. 

4 The small proportion of supervisors in the second, 
third and fourth echelons of management included in 
this study, led the investigators when correlating 
rank with the other variables to combine them into 
one upper management group of 25 cases to com- 


pare with one first level supervisory group of 83 
cases. 


standard score form with means of 20 and 
standard deviations of 5 for the original 
standardizing population of supervisors and 
candidates for supervisory positions. It 
should be noted that the sample used in this 
study was decidedly restricted in range on 
these significant variables. The sample mean 
was half a standard deviation higher in mean 
criterion ratings and supervisory aptitude 
scores than the original population from 
which many of its members were drawn. Re- 
strictions in range were from 12 to 58 per 
cent which severely attenuated the relation- 
ships reported. 

The first cluster of six variables had a 
mean intercorrelation of .48 while this cluster 
correlated .0O with the second cluster of two 
variables. Thus, it appeared that perform- 
ance on the LGD was highly related to com- 
pany rank (r, .88) and to a lesser extent 
with the other variables closely associated 
with rank: education (7 = .57), intelligence 
(ry = 45), supervisory aptitude (r = .30) and 
youth (r= .19). LGD performance was 
unrelated to superiors’ appraisals. Further 
analyses indicated that the mean LGD scores 
(in stens) for subjects from the first, second 
and combined third and fourth echelons of 
supervision were 3.6, 6.7, and 6.7 respec- 
tively which according to an analysis of 
variance were significantly variant at the 1°% 
level. No such significant differentiation was 


found when all first-line maintenance super- 
visors whose mean LGD score was 3.4 were 
compared with all first-line process and pro- 


duction supervisors whose mean was 4.0. 
Staff and technical men, not included in the 
above samples, had a mean LGD score of 5.4. 
This intermediate value reflected probably 
their subordinate position compared to upper 
echelon supervisors but their superior edu- 
cation and intelligence to 
visors. 


first-line super- 

Company rank appeared significantly re 
lated to forced-choice criterion ratings earned 
(r,, = .34) but not to graphic ratings. Rank 
correlated significantly with supervisory bat- 
tery test scores (r, = .42). In this highly 
technical industry, it was not surprising to 
observe the almost complete interdependence 
of supervisory rank and education (7, = .98). 
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Table 1 


Intercorrelations Among Company Rank, Education, Intelligence, LGD Score, Supervisory Aptitude, Youth, 
Superiors’ Appraisals, and Two Clusters of Highly Intercorrelated Variables 


Cluster I 
Com- 
pany Edu- Intelli- LGD 
Rank cation gence 
S8* 
im” 


45* 


Company Rank .98* 
Education 

Intelligence 

LGD Score 

Supervisory Aptitude 

Youth 


Forced-Choice Appraisal 
Graphic Appraisal 
Mean 

Standard Deviation 


12.0 
3.3 


20.4 
44 


4.5 
2.0 
*P < 01.- 
+P < .05. 
Inc 


Super- 
visory 
Score Aptitude Youth Appraisal Appraisal 





Average 
Correlation 
Cluster II with 
FC 


Graphic Cluster Cluste: 





.42* 
a" 
.56* 
30" 


A 
aa” 
A3* 
A9 

.29* 


J 
03 
09 
— 01 
— 01 
.20t 


O7 
— .22f 
—.18t 
— 12 
— 08 

06 


20 
— 10 


48 
38 


06 
— .04 


.68* Eh 


22.8 
3.4 


Not enough data on intelligence available in upper echelons to compute correlation. 


Italicized coefficients are based on biserial r; the remainder are Pearson product-moment correlations 


Not enough cases were available to obtain 
the correlations between rank and intelligence 
although it is expected that it was at least 
.50 since education and intelligence in this 
sample correlated .57. 

To see if company rank was masking any 
relationships between the other variables, 
two attempts were made to study the rela- 
tions among the other variables when rank 
was partialed out. Table 2 shows the ap- 
propriate partial correlations among LGD 
scores, supervisory aptitude, youth and forced- 
choice and graphic appraisals. Since educa- 
tion and rank were about perfectly correlated, 


it was impossible to partial out one without 
eliminating the variance of the other. 

When rank is partialed out, a large per- 
centage of variance in the amount of success- 
ful leader activity is accounted for by youth 
(fo1.2 = — .70). At the same time, men rated 
as inadequate by their superiors on the forced- 
choice performance reports (fo;.. = — .69) 
and the graphic ratings (ro;.. = — .38) tend 
to attain high LGD scores. Correlations 
among the other variables remain unaffected 
by partialing out rank or else are reduced to 
negligible importance. 

There was some doubt about the meaning 


Table 2 
Partial Correlations Between LGD Score, Supervisory Aptitude, Youth, and Superiors’ Appraisals, 
with Company Rank Held Constant Statistically 


LGD 
Score 
LGD Score 
Supervisory Aptitude 
Youth 
Forced-Choice Appraisal 
Graphic Appraisal 


Supervisory 


FC 
Appraisal 


Graphic 
Aptitude Appraisal 


—.16 


Youth 


— .69 
—.17 
06 


—.70 
13 


- .38 
~ 12 
10 
70 
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of these partial correlations. First, in order 
to use partial r, it was necessary to assume 
that the biserial r correlations between rank 
and the other variables were estimates of the 
product-moment correlations between rank 
and the other variable. Second, partial r 
estimates the amount of the relationship be- 
tween a pair of measures ruling out the ef- 
fects of a third when the remaining variances 
of the pair of measures are equal. But, if 
company rank is held constant, it will im- 
pose different restrictions in range on each 
member of a pair of variables so that partial 
r provides a description which usually never 
exists in reality. Thus, if the variances of 
forced-choice appraisal and LGD scores could 
be equalized after company rank was held 
constant, then a correlation of — .69 would 
exist between them. But it is seldom that 
this equalization of variance occurs in nature. 
Therefore, a second approach—purposive 
sampling—was used to rule out the effects 
of company rank on the correlations between 
LGD scores and the other variables. Table 


3 shows these correlations for first level su- 
pervisors only and for upper level supervisors 


only. From the results in Table 3, it was 
inferred that when rank is held constant, ex- 
perimentally, the correlations between LGD 
performance and the other variables tend to 
reduce to insignificance. This was not un- 
expected since LGD score and rank corre- 
lated .88. 

As shown in Table 3, first-line supervisors 


with high LGD scores tended more than 
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upper echelon supervisors to be considered 
inadequate on-the-job, although the differ- 
ences were not significant. The less extreme 
results obtained through this sampling pro- 
cedure as compared with the partial r ap- 
proach was attributed to the fact that no 
attempt was made to equalize the variance 
of each two variables correlated in the upper 
and lower supervisory ranks while partial r 
forced such equalization. 


Conclusions 


The results of this study are a strong con- 
firmation of a number of common-sense ob- 
servations as well as research findings about 
the influence of a person’s rank, prestige or 
status in an organization and his tendency 
to play the role of leader in small groups of 
members from that organization even where 
there is no appointed leader for the immedi- 
ate situation. 

The biserial correlation of .88 between a 
participant’s company rank and his leader 
behavior in a supposedly initially leaderless 
discussion is consistent with a number ol 
other studies. For example, Bass and Coates 
(1) found that there was a significantly 
greater increase in LGD scores on a retest 
a year after the original test by ROTC 
cadets who had been promoted to positions 
of cadet first lieutenant or higher during the 
period which intervened between test and 
retest than the remainder (who had become 
cadet second lieutenants). Similarly, Michi- 
gan Conference Research studies suggest that 


Table 3 


Correlations Between LGD Score and Supervisory Aptitude, Youth, and Superiors’ Appraisals, 
with Company Rank Held Constant by Purposive Sampling 


Variable 


Supervisory Aptitude 
Youth 

Forced-Choice Appraisal 
Graphic Appraisal 


TP < 05 


First Level 


Company Rank 
Second, Third or I uurth Level 


r with 
LGD Score 


r with 
LGD Score 


.29t 21 
09 02 
12 04 
20 12 
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when three-man appraisal boards meet, the 
conclusions reached are those in agreement 
with the member of highest status. Also ex- 
ecutives appear to call so-called planning con- 
ferences of their subordinates mainly to ob- 
tain subordinates’ agreement on what the 
executive has already decided to do (5). 

In previous studies of unacquainted candi- 
dates or candidates of similar initial rank 
there have uniformly been reported, by at 
least 11 separate investigations, correlations 
ranging from .30 to .70 between LGD scores 
and various criteria of supervisory success 
or leadership potential. In the present study, 
these correlations were close to zero sug- 
gesting strongly that Mandell’s suspicions are 
confirmed concerning the general lack of 
validity of the LGD among acquaintances, 
especially where they differ greatly in initial 
prestige or rank. 

The theoretically significant negative cor- 
relations between LGD scores and criteria of 
supervisory success when company rank is 
partialed out statistically, pose more ques- 
tions than they answer. ‘These include: 

1. Is one of the requirements necessary to 
be a successful first-line supervisor, the ability 
to play a subordinate role when in a social 
situation with those of higher company rank 
than he, even though they are not his im- 
mediate superiors and the situation is out- 
side plant jurisdiction? Or, on the other 
hand, are organizations discouraging com- 
munication upward from lower echelon man- 
agement as well as hindering executive de- 
velopment by appraising as inadequate those 
first-line supervisors who give suggestions, 
opinions and information, who take initiative 
and show originality in their interactions with 
their superiors? 

2. Is it the younger, less secure supervisor 
who is most conscious of rank and least apt 
to ignore it, even in unstructured social situa- 
tions? If so, what effect does this have on 
the introduction of new ideas into an or- 
ganization? 
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3. Since active trainees, trainees who re- 
ceive and take advantage of the opportunity 
to make decisions, usually learn more than 
passive ones, to what extent are first-line su- 
pervisors handicapped when placed in con- 
ference training with upper-echelon person- 
nel? 

Summary 


LGD scores of 131 oil refinery supervisors 
were correlated with their rank in the re- 
finery, their education, intelligence, ‘‘super- 
visory aptitude” test scores, and supervisors’ 
appraisals of their on-the-job performance. 
LGD scores correlated .88 with rank, .57 with 
education; .45 with intelligence, .30 with 
supervisory aptitude and — .19 with age. 
Most of these correlations could be attributed 
to the influence of rank on all these variables. 
When rank was partialed out statistically, 
LGD scores were highly positively related to 
age and highly negatively related to superi- 
ors’ appraisals. It was concluded that in 
general the LGD is not valid where partici- 
pants are of known different rank. The 
complexity of the outcomes of this study 
raise some interesting questions about the 
validity of superiors’ appraisals as ultimate 
criteria of supervisory performance and the 
influence of formal rank on the behavior of 
conference participants of differing rank. 


Received May 19, 1952. 
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One non-political point of interest during 
the recent election campaign was the level 
at which each of the rival presidential candi- 
dates was speaking. For instance, some peo- 
ple maintained that Stevenson was doing him- 
self an injustice because he was speaking over 
the heads of his audience, e.g., he was being 
too scholar-like, pedantic, academic, formal, 
learned, etc. On the other hand, some chided 
the same candidate for being a joker or a 
punster. In order to gain some insight into 
the legitimacy of these arguments, a Flesch 
readability analysis of the texts of six of the 
major talks of Stevenson was performed. For 
comparative (control?) purposes, the texts of 
the major talks given by Eisenhower on the 
same dates were also analyzed. 


Method 


The texts of the major talks of each of the 
rival candidates appeared in the Philadelphia 
Inquirer on the morning following the talks. 
In some cases the newspaper saw fit to delete 
certain parts of the speeches of each candi- 
date. In these cases, only the “selected text” 
was available for analysis. On October 28, 
Eisenhower’s major appearance was a tele- 
vision show, in which Eisenhower answered 
questions posed by various Republican com- 
mittee women. In this case, the text of the 
president-elect’s replies to these questions was 
analyzed. Originally, it was our intent to 
analyze the texts of the six talks of each can- 
didate given just prior to the election. How- 
ever, the Sunday, November 2, issue of the 
Inquirer did not contain the previous day’s 
talks of either candidate. Neither candidate 
spoke on Sunday, November 2. Thus, the 
texts of the major talks of Eisenhower and 


Stevenson on October 27, 28, 29, 30, 31, and 
November 3 were included in the present 
study. 

Flesch? recommends, as a sampling pro- 
cedure, that every third paragraph be taken 
and that the first 100 words of each sampled 
paragraph be analyzed. However, since many 
of the paragraphs of each candidate ran under 
100 words, the entire texts were analyzed. 


Results 


The results of the analysis are presented in 
Table 1. 

The “reading ease” of the texts of three of 
the major talks given by Stevenson during the 
final eight days of the campaign were classi- 
fied as “Standard” by the Flesch analysis, 
and three were classified as “Fairly difficult.” 
For the same period, the reading ease of the 
texts of four of Eisenhower’s talks were classi- 
fied as “Standard,” while one was classified 
as “Difficult,” and one was classified as 
“Fairly difficult.” The mean reading ease 
score of Eisenhower’s speeches was “Fairly 
difficult” and of Stevenson’s was “Standard.” 
But the actual difference of only 1.5 points 
is neglible. A “Standard” style of reading 
ease is found, according to Flesch, in Digests ; 
a “Fairly difficult” style is characteristic of 
academic publications. 

From the standpoint of “human interest,” 
the Flesch analysis indicated the style of the 
texts of four of Eisenhower’s speeches to be 
“interesting” and two to be “highly inter- 
esting.” The styles of five of Stevenson’s 
speeches were “interesting” and one was 
“highly interesting.” An “interesting” style, 


1 Flesch, R. A new readability yardstick. J. appl. 
Psychol., 1948, 32, 221-233. 
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Table 1 


Flesch Reading Ease and Human Interest Scores and Descriptions for Texts of Six Pre-election 
Talks by Eisenhower and Stevenson 





Eisenhower 





Fairly difficult 
Standard 
Standard 
Standard 
Difficult 
Standard 


Fairly difficult 


Eisenhower 








Interesting 
Oct. F Interesting 


Description 


Description 


Reading Ease 


Stevenson 


Description 





Standard 
Fairly difficult 
Fairly difficult 
Standard 
Fairly difficult 
Standard 


60.3 Standard 
3.8 





Human Interest 


Stevenson 





Score Description 





Interesting 
Interesting 


Oct. ; Highly interesting Highly interesting 
Oct. 30 . Highly interesting Interesting 


Oct. 3. Interesting 
Nov. .3 Interesting 
Mean Interesting 
5.D. 


according to Flesch, is found in the Digests, 
while a “highly interesting” style is found in 
the New Yorker. 

Thus, using the Flesch analysis as a yard- 
stick, for the period investigated, we have 
little evidence to indicate that Stevenson ap- 
proached the academic level, nor were his 


Interesting 
Interesting 


34.0 Interesting 
5.6 


speeches more difficult to understand than 
Eisenhower’s. On the other hand, by a 
Flesch analysis, there was a slight tendency 
for Eisenhower’s speeches to be more “inter- 
esting.” 


Received January 29, 1953. 
Early publication. 
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Farr, Jenkins, Paterson, and England (5) 
have presented evidence showing that the sim- 
plified reading ease formula by Farr, Jenkins, 
and Paterson (4) yields scores quite in agree- 
ment with those obtained with the original 
Flesch formula (6). The Flesch formula was 
simplified in order to provide a method which 
“would obviously be much faster and would 
require no knowledge of syllabification on the 
part of the analyst” (4, p. 333). The presen- 
tation of the simplified formula, however, was 
not greeted with universal acceptance. Klare 
(8) and Flesch (7) raised two principal ob- 
jections. First, they doubted the claimed 
time economy of the new method. This argu- 
ment was met with a study by the Minnesota 
group (5) in which a number of graduate stu- 
dents determined reading ease scores by both 
methods. The new method was found to be 
much faster. 

But their second objection to the new 
method is more formidable, and has as yet 
been unanswered. The argument states that 
counting one syllable words is less accurate 
than counting syllables. The logical basis 
for the position is sound—syllable counting 
involves attentive study of each word, where- 
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curacy of the two counting methods. Other 
aspects of reading ease calculation were also 
investigated. 

Method 

Untrained * subjects for the experiment in- 
cluded 72 male and 72 female freshman stu- 
dents. All were enrolled in freshman English 
in the School of Agriculture and Home Eco- 
nomics at the University of Minnesota.‘ 
These students had never been trained in 
and probably had never heard of the tech- 
niques employed in making readability analy- 
ses. 

Since the number of syllables is inversely 
related to the number of one syllable words, 
it became necessary to control the difficulty of 
the test material.° Further, it was felt that 
ability to perform readability counts may be 
related to a person’s reading ability. Because 
of this, the subjects were grouped into four 


Telatively homogeneous groups on the basis 


of their paragraph comprehension scores on 
the Nelson-Denny Reading Test. The time 
taken to perform the counts was also meas- 
ured. Subjects within each group were then 
randomly assigned to conditions imposed by 
the following factorial design: 
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Upper 25% 





~ Syllables _ 4 
OS Words 


Medium 
Material 








Difficult Syllables 





Material 


OS Words 





as a person picking out one syllable words 
might just scan the passage. 
This study was designed to bear on the 
question. A comparison was made of the ac- 
1 Factorial design experiments and the basic com- 
putations involved are discussed by Edwards (3) 
and Nelson (9). 


2 Formerly research assistant in the Industrial Re- 
lations Center. 
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8 Farr, Jenkins and Paterson (4) have stated that 
their simplification should remove the aura of com- 
plexity from the Flesch formula and make it more 


useful to practical men in their daily work. Be- 
cause of this, we felt it would be most desirable to 
use naive or untrained subjects who were not ori- 
ented in favor of either method of counting. 

4 The authors wish to express thanks to Professor 
Ralph G. Nichols and Professor James I. Brown who 
offered the cooperation of their department. 

5 The test material was carefully chosen so as to 





108 


A test form was developed for each of these 
six conditions. The first page of each was a 
simple explanation of the subject’s task. Ex- 
planation was facilitated by the use of two 
short examples which were used for all six 
conditions. The second page included a 50- 
word practice passage of the same difficulty 
as the test passage. The third page was de- 
voted to two test passages of 100 words each. 
The subjects were instructed to perform the 
proper count separately for each passage and 
record their answers in the spaces provided. 

Oral instructions were framed to emphasize 
accuracy over speed. Subjects were asked to 
check their completed work, and then to re- 
cord the letter appearing on the blackboard. 
A new letter was placed on the blackboard 
every 10 seconds, thus providing time scores 
without emphasizing the speed factor. 

Four factorial designs as shown above were 
used in the experiment to include information 
separately for male and female subjects, and 
for error and time scores. Since 72 subjects 
of each sex were available, plans provided for 
three replications in each cell. Several stu- 
dents were absent on the day of the adminis- 
tration. . These were immediately sent a test 
form by mail. In all, 24 students were ab- 


sent; 20 were located; 16 returned completed 


test forms. These mailed returns, of course, 
did not include a time score. It was, there- 
fore, necessary to reduce the number of repli- 
cations in the two time-score designs to two 
per cell. Thus, error data for five boys and 
three girls were missing. 

Since statistical analyses of the experimen- 
tal data required an equal number of replica- 
tions per cell, the missing scores were esti- 
mated. Estimates were based on the best in- 
formation available; i.e., the scores of the two 
subjects in the same cell as the missing value. 
The mean of the two scores was used. The 
degrees of freedom for error were reduced in 
accordance with the suggestion of Cochran 
and Cox (2, p. 73). 

Homogeneity of variance was tested by 
means of Bartlett’s Test (3). In no case was 
. Chi-square sufficiently large to reject the hy- 





cover the entire range of difficulty. “Easy” regis- 
tered above 90; “medium” in the 40’s; “difficult” be- 
low 10. Passages selected registered nearly the same 
(+ 5) RE scores on both formulas. 
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pothesis of equal variances. Plotting the data 
indicated the distributions to be non-skewed 
and platykurtic. Since skewness is the most 
serious deviation from normality (for pur- 
poses of variance analysis), and since Cochran 
(1) has shown that non-normality does not 
seriously alter conclusions derived from vari- 
ance analysis, no test was made of the as- 
sumption of normally distributed parent popu- 
lation. 


Results 


Table 1 shows the results of the analysis of 
variance. It is seen that some of the sources 
of variation are significantly different from 
the variation due to error. These results may 
be interpreted more easily, however, by refer- 
ring to Table 2 which gives the means and 
standard deviations for accuracy and time re- 
quired. 

Both boys and girls performed the one syl- 
lable word count more accurately than the 
syllable count. For the boys, the difference 
was significant at the 5 per cent level. For 
the girls, however, the difference was not sta- 
tistically significant. Both boys and girls also 
did the one syllable word count in 25 per 
cent less time than that required for the syl- 
lable count. The differences were statistically 
significant. 

These findings suggest that the new for- 
mula is superior with respect to both ac- 
curacy and time required to perform the 
counts. Since the subjects were unfamiliar 
with the counting methods required by read- 
ability formulas, we can state with assurance 
that the F, J, and P simplified formula is in- 
herently easier to perform. This finding, com- 
bined with previous evidence (5), shows that 
persons will perform the new count more rap- 
idly and with greater accuracy regardless of 
the degree of their skill. 

Data in Table 1 show a significant interac- 
tion for boys between type of count and diffi- 
culty of material. Table 3 shows this inter- 
action effect more clearly. The one syllable 
word count was less accurately performed for 
easy material but was more accurately per- 
formed for difficult material. Expressing the 
error as a percentage may cover up the prac- 
tical significance of these differences. This is 
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Table 3 


Mean Per Cent Error Made by Boys as Related 
to Difficulty Level and Type of Count 








Mean Per Cent Error 





One Syl- 
lable Words 


Difficulty 
of Material 


Easy 
Medium 
Difficult 


Syllables 





—0.55 
—0.58 
— 3.38 


— 1.67 
0.90 
0.76 





1 Data are not included for girls since for them the 
effect was not statistically significant. 


because there are more total syllables than 
one syllable words in any given reading pas- 
sage. Therefore, because RE scores depend 
on the absolute number of units counted, a 
given variation in reading ease score reflects 
a larger per cent error in the one syllable 
word count than in the syllable count. This 
means that a greater per cent error is toler- 
ated by the simplified formula than by the 
original formula. 


Discussion 


The findings do not relate to the degree of 
agreement between scores derived via the two 
formulas. They relate instead to whether or 
not the new formula is operationally a simpli- 
fied version of the old or whether or not it is 
simplified in name only. The results suggest 
that the revised formula is superior with re- 
spect both to time taken and accuracy with 
which it is applied. The use, in this study, 
of untrained subjects shows that this su- 
periority does not depend on training or previ- 
ous experience but resides instead in the dif- 
ferent method of counting required by the 
new formula. This formula, therefore, ap- 
pears to be operationally a simplified version 
of the old one. 


Summary 


‘A factorial experiment was undertaken to 
study the effects of various factors on the ac- 
curacy and time taken by naive subjects to 
perform readability counts. The factors in- 
vestigated were: (1) difficulty of reading ma- 


terial; (2) the type of count performed; (3) 
reading ability of persons performing the 
counts; and (4) sex. 

The major finding was that the counting of 
one syllable words could be done in about 
three-fourths the time required for counting 
syllables. Boys performed the former count 
more accurately than the syllable count. This 
difference was not statistically significant 
among the girls. 

A significant interaction effect was found 
between difficulty level and type of count. 
The syllable count was performed more ac- 
curately for easy material; the one syllable 
word count was performed more accurately 
for difficult material. Neither accuracy nor 
time taken was significantly associated with 
reading ability or sex. 

It has been concluded that the new F, J, 
and P formula is truly simplified since it can 
be applied with a greater degree of accuracy 
and requires less counting time. 


Received February 24, 1953. 
Early publication. 
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Reliability of the Original and the Simplified Flesch Reading 
Ease Formulas 


George W. England, Margaret Thomas, and Donald G. Paterson 


University of Minnesota * 


Both Klare (7) and Flesch (5) in their at- 
tacks on the Farr, Jenkins, and Paterson (2) 
simplification of the Flesch reading ease for- 
mula (4) suggested that the reliability of the 
F, J, and P simplification formula would be 
lowered. Klare asserted that the simpler 
method would magnify each counting error 
and thus decrease reliability. Flesch attacked 
the idea that r between the original and the 
simplified reading ease scores would be higher 
for more heterogeneous materials? than for 
the employee handbooks used in developing 
the F, J, and P simplified formula and, in ef- 
fect, implied that the reliability of the Flesch 
formula is impaired by the F, J, and P simpli- 
fied formula. 

Farr, Jenkins, Paterson, and England (3) 
were able to answer the Klare and Flesch 
criticisms with respect to the relative knowl- 
edge of syllabification required for both for- 
mulas and also with respect to the speed with 
which the new, simplified formula can be ap- 
plied. A mean time of 82 seconds versus 147 
or a saving of 65 seconds per 100-word sam- 
ple was found. Discussion of the problem of 
reliability, however, was necessarily postponed 
with the following statement, “A thorough- 
going study of the reliability of both methods 
would be needed to settle this issue” (3, p. 
56). 

The present paper reports on this aspect of 
the problem. 


* England was research assistant in the Industrial 
Relations Center, Miss Thomas was a graduate stu- 
dent in psychology, and Paterson was professor of 
psychology and member of the staff of the Indus- 
trial Relations Center at the time the study was 
made. England is now with Personnel Research Staff 
of RCA at Camden, N. J., and the other two authors 
continue in their same roles at the University of 
Minnesota. 

1 The idea that r would be lowered if more hetero- 
geneous materials were used is naive, statistically 
speaking. 
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Procedure 


Data from House Organs. During the 
spring quarter of 1952, 13 pairs of analysts 
computed reading ease scores by both for- 
mulas for each of 196 hundred-word samples 
drawn from 49 house publications.? Most of 
these analysts had participated during the 
winter quarter of 1952 in the prior study of 
the time required to compute reading ease 
scores by the Flesch method and by the F, J, 
and P simplified method. Stress was now 
placed on accuracy of counting syllables, one- 
syllable words, and sentence length as well as 
in the use of the Farr and Jenkins table (1) 
and the Farr, Jenkins, and Paterson table (2). 
One member of each pair used the old for- 
mula and the new formula in analyzing a 
given hundred-word sample and the other 
member of each pair did likewise for the 
same hundred-word sample. The 14 analysts 
formed 13 pairs and each pair, on the aver- 
age, analyzed about 15 hundred-word sam- 
ples. It is recognized that this procedure 
would produce lower reliability coefficients 
than would have been the case if one pair of 
experienced analysts had analyzed all 196 
samples. It was anticipated, however, that 
the emphasis on accuracy of all the operations 
would tend to produce acceptable reliability 
data. 

Data from Books. One analyst* under- 
took to compute reading ease scores by both 


2 Graduate students in Mr. Paterson’s Seminar in 
Applied Psychology participated in the study. The 
work was done under the immediate supervision of 
George W. England who also assumed responsibility 
for the preparation of the statistical constants and 
the reliability coefficients. The writers are grateful 
to the following students: Robert C. Becker, Sarah 
Ruth Cook, Ellen A. Corcoran, George W. England, 
Benno G. Fricke, Richard S. Hatch, Sulo N. Havu- 
maki, Benjamin Lasoff, Raymond C. Lee, Jr., Paul 
W. Maloney, Ernest L. McCollum, Arthur C. Mc- 
Kinney, Charles Newstrom, and Margaret Thomas. 

8 Margaret Thomas conducted this phase of the 
investigation. 
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Table 1 


Means, Standard Deviations and Reliability Coefficients for Analyst to Analyst Study of the Flesch 
and the Farr, Jenkins, and Paterson Simplified Reading Ease Formulas Applied to House Organs 


Note: N = 196 hundred-word samples drawn from 49 House Organs with counts and computations made by 13 
pairs of analysts. 


Analyst 1 


(Analyst 1 
versus 
Analyst 2) 


Analyst 2 


Mean 


20.3 
158.7 
62.4 
51.7 
47.8 


S.D. 


7.0 
15.3 
7.6 
16.1 
14.1 


Mean 
20.3 
159.6 
62.4 


51.2 
48.0 








Sentence Length 
Syllable Length 
No. of One-Syllable Words 
Flesch R. E. Score 

F, J and P R. E. Score 


7.1 
15.6 
7.9 
15.9 .96 
14.1 93 


.90 
97 
95 





formulas for each of 196 hundred-word sam- 
ples drawn from 28 books. Then, at a later 
date, this same analyst recomputed the data 
for 77 of the 196 samples drawn from 11 
books. In this way, a basis was provided for 
computing test-retest reliability coefficients 
for the 77 samples. 


Results 


Data from House Organs. The statistical 
constants and the reliability coefficients * are 
presented in Table 1. It will be noted that 
the means and sigmas obtained by each of a 
pair of analysts are quite close. The relia- 
bility coefficients shown in column 4 of Ta- 
ble 1 are all .90 or higher. As was true in 
the Hayes, Jenkins, and Walker study (6), 
the reliability of computing average sentence 
lengths per hundred-word sample is lower 
than for making the syllable counts. Further- 
more, the evidence shows that total syllable 
counts and counting the number of one syl- 
lable words per hundred-word samples are 
made with a gratifyingly high degree of relia- 
bility (.97 and .95 respectively). The relia- 
bility coefficients of reading ease scores per 
hundred-word sample whether computed by 
the original Flesch method or by the simpli- 
fied F, J, and ‘P method are likewise high (.96 
and .93 respectively). These reliability co- 
efficients compare favorably with those re- 

*The reliability coefficients in Table 1 may be 
thought of as “alternate form reliability coefficients” 


since they are “analyst to analyst” reliability coeffi- 
cients, 


ported by Hayes, Jenkins, and Walker (6) 
for the original Flesch formula. Thus, no 
real loss in reliability has arisen by the intro- 
duction and use of the F, J, and P simplified 
formula. 

Data from Books. The statistical constants 
and the reliability coefficients for a single 
analyst are presented in Table 2. Again, the 
means and sigmas of the first count or com- 
putation (test) and of the second count or 
computation (retest) by this analyst are 
quite close. Of more importance, however, is 
the fact that these hundred-word samples 
drawn from 11 books represent far more 
heterogeneous materials than was true of the 
samples drawn from the house organs. The 
sigmas in Table 2 when compared with the 
sigmas in Table 1 clearly prove this point. 
The range of the original Flesch reading ease 
scores for these 11 books was from 100 for 
“Fun with Dick and Jane” to 26 for “Per- 
sonality, a Psychological Interpretation.” As 
a matter of fact, the 11 books were selected 
from all the difficulty levels. And, as would 
be expected, the test-retest reliability coeffi- 
cients are much higher. In fact, they ap- 
proach unity. This is due to the combined 
operation of the greater heterogeneity of ma- 
terials sampled and having only a single 
“compulsive” analyst make all counts and 
computations. The results closely approxi- 
mate the high analyst to analyst reliability 
coefficients reported by Hayes, Jenkins, and 
Walker (6). 
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Table 2 


Means, Standard Deviations and Test-Retest Reliability Coefficients for a Single Analyst Study of the 
Flesch and the Farr, Jenkins, and Paterson Simplified Reading Ease Formulas Applied to Books 


Note: N = 77 hundred-word samples drawn from 11 books with all counts and computations made by a single 
analyst. 


First Count or Second Count or 
a Computation 
(Test) (Retest) 


Test 
— Retest 
S.D. r 


Mean S.D. Mean 
Sentence Length 
Syllable Length 
No. of One-Syllable Words 
Flesch R. E. Score 


F, J, and P R. E. Score 


20.0 10.7 
146.7 19.6 
69.4 8.7 
61.7 24.4 
58.7 21.7 


19.8 10.6 
146.6 19.7 
69.6 8.5 
62.4 24.7 
59.0 21.6 


Intercorrelations between Original and 
Simplified Formulas 


Two intercorrelations between the original 
and simplified RE scores for 196 samples from 
49 house publications were computed: (a) 
for analyst 1, r was + .84; and (b) for 
analyst 2, r was + .87. The intercorrelation 
between the original and simplified RE scores 
for 196 samples from 28 books for a single 
analyst was + .94 and for the 77 samples 
from 11 books for the same single analyst, r 
was .97. The intercorrelation for the original 
and simplified RE scores for the averages of 
the 28 books (7 one-hundred word samples 
each) was + .97. Thus, the original and the 
simplified RE scores are comparable when 
computed by a single, fairly experienced and 
compulsive analyst. 


Summary 


The reliability of the original and the sim- 
plified Flesch reading ease formula based on 
(a) samples drawn from house organs, using 
13 pairs of relatively inexperienced analysts; 
and (b) samples drawn from books, using a 
single, more experienced analyst is reported. 
The findings confirm the earlier reliability 
study by Hayes, Jenkins, and Walker (6) 
and show that both the original and the sim- 


plified Flesch reading ease formulas are highly 
reliable. With heterogeneous materials and 
a single “compulsive” analyst, test-retest re- 
liability coefficients from + .95 to + .99 
were obtained. Intercorrelations between the 


original and simplified formulas are likewise 
“high.” 


Received February 10, 1953. 
Early publication. 
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Whether readability formulas can be used 
to predict more or less success in all printed 
communication is not known. Some authors 
suggest their formulas will discriminate among 
articles expected to get more or less reader- 
ship, understanding, etc. Dale and Chall (1) 
use the title, A formula for predicting read- 
ability. Flesch (2) says more readable writ- 
ing will “appeal to, readers” and cites an ex- 
periment by Swanson (9) where readership 
was the criterion. However, the excellent 
bibliographies of Hotchkiss and Paterson (5), 
Flesch (2) and Klare (6) show few valida- 
tion studies using comprehension and reten- 
tion as criteria. 

In their pioneering study Gray and Leary 
(4) found 24 factors of style related to 
reading comprehension of adults. Gray and 
Leary, Dale and Chall, and Flesch reduced 
these to a few factors. Their findings agreed 
on word difficulty and sentence length. 
Flesch also used personal references in one of 
his formulas. 

In two experiments with articles in a mid- 
western farm paper Ludwig (7) varied one 
factor at a time, word difficulty and personal 
references. His test articles were each read 
by more than 40 per cent of the two samples 
of farmers. Readership differences between 
the experimental pairs of articles were small 
and were not significant. 

Analysis of Ludwig’s findings suggested 
several hypotheses: 

Readability factors would have maximum 
effect when two or more positively related 


* Grateful acknowledgment is made to the Gradu- 
ate School, University of Minnesota, for the research 
grant to finance preliminary analysis, field work, and 
part of analysis of results. Intensive analysis of 
content and information tests were supported under 
Contract N60NR-246, T.O. 4, Office of Naval Re- 
search, with the senior author as responsible investi- 
gator. Aid was provided by staffs of the Industrial 
Relations Center and Research Division, School of 
Journalism, University of Minnesota, and by Drs. 
James J. Jenkins and Robert L. Jones. The senior 
author is indebted to Dr. George M. Klare, Univer- 
sity of Illinois, for his critique of the analysis. 


Harland G. Fox 


Industrial Relations Center, 
University of Minnesota 


and 


factors were varied. Easier words and shorter 
sentences, for example, should result in in- 
creases of comprehension, other things being 
equal. 

Where more than 40 per cent of an audi- 
ence selects and reads an article, less gains 
in effect can be expected from improved 
readability. Also, where lesser proportions 
of an audience read an article, the more that 
gains may come from increases in readability. 

Motivational factors inherent in content, 
such as subject matter, probably are more 
important, generally, then readability where 
individuals select what they want to read 
and learn from printed media. For example, 
comic strips are easy to read but vary 
widely in readership, or audience interest. 
One comic strip may reach 70 per cent and 
another strip in the same day’s newspaper 
reach 20 per cent of the same audience. 

Readability factors might be more im- 
portant than motivational factors where in- 
dividuals are required to read and study and 
are tested on their learning. This would be 
the case in classroom and training situations. 


The Present Experiment 


In this study easier and harder versions of 
12 articles were published in three issues of 
a paper sent monthly to employees of a mid- 
western company. Four articles appeared 
each month. The 296 employees were ran- 
domized into two groups, “easy sample” and 
“difficult sample.” 

Easy sample received copies of the news- 
paper with easier versions of the 12 articles. 
Difficult sample received the same newspaper 
with the harder versions. 

The 12 articles concerned company prod- 
ucts, company history, safety program, and 
the working agreement which covered wages, 
hours, and working conditions. 

Effects of the versions were determined by 
these criteria: (1) Retention; measured by 
a 43-item test of multiple-choice questions 
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based on the 12 articles; (2) Readership; 
measured on easier or harder versions of two 
articles; and (3) Comprehension; measured 
by a 10-item test given before and after ex- 
posure to easier or harder versions of two 
articles. 

Four other instruments were used. They 
involved general opinions about company and 
union, general satisfaction with one’s job 
(11), Sanford’s authoritarian-equalitarian 
scale (8), and Goossen’s disguised intelligence 
test (3). 

Each subject followed this sequence in an 
interview: (1) Took the 43-item information 
test; (2) Read easier or harder versions of 
two articles; (3) Reported whether he had 
read the two articles when they appeared in 
the company newspaper. (Sixty per cent had 
read the articles. Actually these subjects 
were reading the articles a second time in the 
comprehension test.); (4) Took 10-item in- 
formation test on the two articles. (The 10 
items were included in the 43-item test.); 
and (5) Answered four questionnaires on 
general opinions about company, union and 
job, authoritarian-equalitarian personality 


aspects, and intellectual ability. 


Readability Differences 


Three questions concern changes from 
harder to easier versions. What were the 
differences in readability? Could some fac- 
tors decrease comprehension and so decrease 
positive effects of other factors? Did easier 
and harder versions have the same informa- 
tion content? 

The following readability differences ap- 
peared. 

Formula scores. By the Flesch formula, 
the easier versions had a mean score of 73 
(fairly easy) whereas the harder versions 
scored an average of 59 (fairly difficult). 
The Dale-Chall formula gave similar results. 
The easier versions had a mean Dale-Chall 
score of 7th-8th grade compared with a score 
of 11th-12th grade for the harder versions. 

Number of words. The easier versions had 
fewer words, an average of 284, while the 
harder versions had an average of 332 words. 
The easier versions totaled 3,410 words and 
the harder versions totaled 3,983 words. 
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Flesch human interest index. The easier 
versions had a mean score of 46 (very in- 
teresting) and the harder versions a mean 
of 17 (mildly interesting). 

Sentence length. The easier versions had 
an average sentence length of 13.0 words. 
The harder versions had an average sentence 
length of 19.4 words. 

Syllables per 100 words. The easier ver- 
sions had 142 syllables per 100 words and the 
harder versions 161 syllables per 100 words. 

Unfamiliar words. As scored by the Dale- 
Chall list of 3,000 unfamiliar words, the 
easier versions had 11.6 per cent unfamiliar 
words whereas the harder versions had 20 per 
cent unfamiliar words. 

Verbs and adjectives. The easier versions 
had 130 verbs per 100 adjectives. The harder 
versions had 89 verbs per 100 adjectives. 

This study could not answer the question 
of whether some of these or other readability 
factors cancelled out comprehension gains. 
No previous research had been published at 
the time of designing the study to suggest 
this possibility. The Gray-Leary and Swan- 
son investigations indicated that readability 
factors such as these would combine for posi- 
tive effects. 

In the opinion of three judges the informa- 
tion content of the easier and harder ver- 
sions was the same. They used the multiple- 
choice questions as aids to their judgments. 
No method was known to the investigators 
by which information content could be classi- 
fied and its similarity between easier and 
harder versions defined quantitatively. 

Differences in effects of easier or harder 
versions could not be attributed to subject 
matter. 

Whether differences could be attributed to 
fewer words used in easier versions is a ques- 
tion of whether details were amplified in the 
harder and longer versions. Wilson (10) 
used versions 300, 600 and 1,200 words in 
length. She found that amplification was 
helpful only where the reader had difficulty 
with concepts. Any advantage in this re- 
spect might be in favor of the longer versions. 
However, the investigators believed that the 
information content and amount of ampli- 
fication were held constant. Again, no quan- 
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titative method was devised to permit other 
investigators to check this point. 


Characteristics of the Two Samples 


The two samples were interviewed under 
the same conditions by the same group of 
interviewers in the company’s dining hall. 
A total of 130 interviews was completed (67 
easy sample and 63 difficult sample). 

Attrition of the original population of 296 
employees was due to several factors. A 
total of 96 were “laid off”; 6 quit and the 
remainder were on night shift or vacation or 
were ill or were illiterate. No significant 
differences between the samples could be at- 
tributed to these factors. 

The two samples did not differ significantly 
on the following social characteristics: 

Social and individual. Age, sex, years of 
schooling, marital status, mean scores on the 
authoritarian-equalitarian scale, and _ the 
Goossen disguised intelligence test. 

Job and union. Years with the company, 
years in current job, union membership, years 
in the union, readership of a union paper, and 
opinions about company, union, and job. 

Easy sample had more employees high and 
low in intellectual ability as measured by the 
Goossen disguised intelligence test. Mean 
test scores, however, did not differ signifi- 
cantly. 

Compared with the general population of 
American adults, these two samples of 128 
employees included more females (60 per 
cent); younger persons (36 per cent from 
20 to 30); more schooling (55 per cent with 
some high school and 13 per cent with some 
college). 

More than 60 per cent had worked more 
than five years for this firm and 65 per cent 
were union members. Of those who were 
union members two-thirds had been union 
members more than five years. 


Results 


. Retention. The two samples did not differ 
significantly in mean scores on the 43-item 
test based on information in both versions of 
the 12 articles. 

Item analysis showed the two samples did 
not differ significantly on 37 of the 43 in- 
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formation questions. Of six items where 
differences were significant, easy sample had 
higher scores on two and difficult sample had 
higher scores on four questions. 

No consistent patterns appeared in kinds 
of items on which one sample succeeded more 
than the other. Easy sample had higher 
scores on questions about annual sick leave 
and a provision of the working agreement. 
Difficult sample had more success on two 
items about company history, the “cause” of 
hard water, and the name of an official who 
bargained with the union. 

The two samples did not differ in remem- 
bering information from easier or harder 
articles. The formulas did not appear to 
measure factors in the articles related to dif- 
ferences in retention. 

Readership. The two samples did not 
differ significantly in readership. Of easy 
sample, 65 per cent (m= 67) read both 
articles; of difficult sample 61 per cent 
(n = 63) read both articles. 

The easier versions cid not reduce the pro- 
portion who failed to read both articles. Of 
easy sample 22 per cent did not read either 
article and 29 per cent of difficult sample 
did not read either article. 

Neither the readability formulas nor the 
Flesch human interest index seemed to meas- 
ure factors in the articles related to differ- 
ences in readership. 

Comprehension. Easier and harder ver- 
sions of the two articles used in the test of 
readership also were used to test compre- 
hension. Subjects read easy or difficult ver- 
sions of the two articles in the test situation; 
immediately after reading they answered 10 
questions based on the two articles. These 
10 questions had been included in the initial 
43-item test. Mean scores on this 10-item 
test, before and after reading in the test 
situation, are shown in Table 1. 

Easy sample did significantly better than 
difficult sample on the 10-item after-reading 
test. However, the two samples did not 
differ significantly on the before-reading test. 

This result indicates that the readability 
formulas did measure factors in the articles 
which related to differences in comprehen- 
sion. 

An analysis of the 10 items showed that 
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Table 1 


Mean and Variance Significance Tests for Sample Easy and Sample Difficult on Information Tests 
Note: Sample Easy N = 67; Sample Difficult V = 63. 


Mean 
Sample 
Easy 
n= 67 


Variables 

43-Item Information Test 

10-Item Test Before Reading 

10-Item Test After Reading 

Gains in Correct Response on 
10-Item Test After Reading 


20.93 
5.25 
8.03 


2.78 
** Significant at the 1% level. 


easy sample made consistent gains in com- 
prehension over difficult sample. None of 
the gains appeared important except for one 
item. On this question easy sample showed 
four times as high a gain (54 per cent) in 
correct responses as difficult sample (13 per 
cent). 

The evidence, both qualitative and quan- 
titative, showed that readability indices could 
be used to predict differences in comprehen- 
sion between two versions of the same ma- 
terial. 

Readers vs. non-readers. In the reader- 
ship test of two articles, about two-thirds of 
the 128 employees read both articles. The 
remainder either ignored both items or read 
one. Would readers have higher information 
scores than non-readers? Obviously, much 
information could have been learned from 
personal experience or other sources. Yet 
one might expect readers to know more; the 
reading behavior might be symptomatic of 
efforts to learn similar information from 
other sources. 

By the reading criterion, subjects were 
divided into three groups: 80 who had read 
both test articles in the company newspaper; 
15 who had read one; 33 who had read 
neither. The two extreme groups, readers 
and non-readers, were compared. 

On the 43-item information test the read- 
ers had a mean of 23.5 items, or 55 per cent, 
correct. Non-readers had a mean of 18.5, 
or 43 per cent, correct. This difference was 
highly significant (¢ = 5.11). 

Item analysis (by reader and non-reader) 


Mean 
Sample 
Difficult 
n = 63 


S.D. 
Sample 
Easy 


$.D 
Sample 
Difficult 


22.29 
4.87 
7.16 


5.96 
1.82 
1.91 


5.54 
1.78 
1.85 


2.30 1.72 


1.77 


showed 11 significant differences. In each 
case readers were more successful. 

Of the remaining 32 items readers had a 
higher proportion of correct response on 29 
items. By the sign test, this was a highly 
significant difference. 

Readers had significantly higher mean 
scores than non-readers on the 10-item test 
before but not after reading the two test 
articles. The non-readers gained more in 
comprehension. From before to after read- 
ing, the non-readers gained on the 10 items 
an average of 2.9 items correct, compared 
with 2.2 for readers. This was a significant 
difference (t = 2.00). 

Whether readers had more _ intellectual 
ability than non-readers became an important 
question. They did not differ in years of 
schooling or in Goossen disguised intelligence 
test scores. This suggested that readers might 
differ from non-readers on other social char- 
acteristics which could explain differences in 
motivation, or interest in the material. 

Ten factors were analyzed for clues to 
differences in motivation between readers and 
non-readers. These were age, sex, years 
with the company, years on the specific job, 
union membership, years in the union, read- 
ership of a union paper, general opinions 
about company and job, authoritarian-equa- 
litarian score, and union activity. Readers 
and non-readers did not differ on these fac- 
tors. No characteristic discriminated be- 
tween those employees more and less mo- 
tivated to read and learn information from 
the company newspaper. 
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Table 2 


Mean and Variance Significance Tests for Readers and Non-Readers on Information Tests 
Note: Readers N = 80; Non-Readers N = 33. 








Mean 


Variables Readers 


Mean 


Readers 


S.D. 
Non- 
Readers 


Non- S.D. 


Readers F 





43-Item Information Test 
10-Item Test Before Reading 
10-Item Test After Reading 
Gains in Correct Response on 
10-Item Test After Reading 


23.46 
5.56 





* Significant at the 5% level. 
** Significant at the 1% level. 


Summary and Discussion 


When easier and harder versions of 12 
articles were printed in three monthly issues 
of a company newspaper and two samples of 
128 employees were tested, it was found: 

1. Subjects exposed to harder versions suc- 
ceeded as well on a 43-item information test 
as those exposed to easier versions. 

2. Harder versions succeeded as well as 
easier versions in attracting readers to two 
articles. 

3. Subjects who read easier versions of two 
articles in a test situation did significantly 
better on a 10-item test of comprehension 
than those who read harder versions. 

This result indicates that readability 
formulas can predict some differences in com- 
prehension between versions of the same 
material. 

4. Readers of two articles were more suc- 
cessful on the 43-item test of information in 
the 12 articles than those who had not read 
either of the two articles tested for compre- 
hensibility. 

These results indicate that readability 
formulas can be used to predict differences 
in comprehension between two versions of 
the same material. However, the findings 
do not support the utility of such formulas 
in predicting differences in readership, and 
retention for similar material, conditions, and 
time periods. Even combined treatment of 
readability factors, such as was attempted in 
this study, did not influence retention. 

One factor limiting these results is the 
relatively high interest (readership by 60 
per cent of the samples) in two of the articles. 

The lack of differences in retention be- 


18.50 
4.36 


5.98 1.54 
1.74 1.12 
2.21 1.61 


4.83 
1.66 
1.75 


1.86 


1.61 1.36 


tween easier and harder versions suggests 
that investigation of motivational factors in- 
herent in content is most crucial where in- 
dividuals select what they want to read and 
learn. This does not gainsay the possibly 
greater importance of readability where in- 
dividuals are required to read and study as 
in classroom and training situations. 


Received April 4, 1952. 
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There are probably no individuals engaged 
in measuring attitudes or public opinion who 
would not agree that it is wise to pre-test 
questionnaires. Many would probably say 
that the conventional pre-tests are conducted 
efficiently and result in well designed and 
adequately worded questionnaires. About 
this latter point there is some doubt. 

Several years ago this writer conducted 
a pilot study of respondent comprehension 
using a battery of “typical” opinion ques- 
tions. The results of this study seem to 
shed some light on the question of the ade- 
quacy of our present pre-testing methods. 


Procedure 


Nine questions were chosen from “The 
Quarter’s Polls,’ and were presented to a 
randomly selected group of 48 middle-income 
respondents in Cincinnati and Centerville, 
Ohio. The questions were selected to cover 
a wide range of reading difficulty as judged 
by the Flesch readability formula. Of the 
nine questions, one had a difficulty equal to 
the adult average level as defined by Flesch, 
four were above and four below this level of 
difficulty. A second criterion for the selec- 
tion of questions was that they be of topical 
interest to the respondents at the time this 
study was being conducted. 

To test the respondents’ comprehension of 
the questions, a rather simple procedure was 
used. The question was presented to the 
respondent and after his answer had been 
given he was asked to repeat in his own 
words the meaning of the question as nearly 
as he could. The interviewer then recorded 
the respondent’s interpretation verbatim. 
The order of question presentation was varied 
from respondent to respondent. 

There are a number of criticisms that one 
could level against this method of measur- 
ing comprehension. It may be argued that 


1Flesch, R. The art of plain talk. 
Harper and Brothers, 1946. 
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merely because a person can parrot a ques- 
tion, it does not necessarily follow that he 
comprehends its meaning. On the other 
hand, if a respondent gives a faulty inter- 
pretation, it seems fairly safe to conclude 
that he did misinterpret it. This would 
probably lead to an underestimation of com- 
prehension, certainly not an overestimation. 

The respondents’ interpretation of each of 
the questions was judged to fall into one 
of four categories: (a) correct interpreta- 
tions, leaving out no vital parts; (b) gen- 
erally correct replies, or replies in which no 
more than one of the parts was altered or 
omitted; (c) partially wrong interpretations, 
but showing the respondent knew the gen- 
eral subject of the question; (d) completely 
wrong interpretations or no-response. As 
an example of the scoring take the question: 
“Suppose the government had no control 
over how the businesses are run in this coun- 
try, who do you think this would help the 
most—the people as a whole, or those who 
run big businesses, or those who run small 
businesses?” 

A partially correct interpretation was: “If 
there weren’t any control, which would have 
the greater power—the small business or the 
larger.” 

A partially wrong interpretation was: 
“Bout government owning business—who 
would benefit most, big businesses or small 
businesses.” 

Or: “Just who would get the business— 
the big guy or the little guy?” 

An example of a completely wrong inter- 
pretation was: “Something about having a 
President. If he does things that people 
don’t agree with, they have a right to tell 
him—like Walter Winchell.” 

The responses to the questions were judged 
individually by each of two judges. In case 
of disagreemeni, the response was discussed 
until agreement could be reached as to which 
interpretation category it belonged. 
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Results 


There were 430 question interpretations of 
which 73, or 17.0 per cent, were either wholly 
or partially wrong. Two respondents did not 
make an interpretation of one question be- 
cause, in one case, the telephone rang and, 
in the other case, something was boiling 
over on the stove. 

The findings would not be startling if one 
could say that these questions have now 
been pre-tested and can be re-worded so as 
to make them more comprehensible. How- 
ever, the questions used in this study had 
already been presented to large cross sec- 
tions of the general public by well known 
polling organizations. That is, these are 
questions after they presumably have been 
subject to the usual pre-test. 

If the questions had been asked of the 
respondents and only their answers recorded 
in the usual way these errors of comprehen- 
sion would not have been detected. Jn no 
instance did a respondent say that he did 
not hear a question or that he misunderstood 
it. The questions were asked, answers given, 
and all seemed well. 

If one grants that some degree of respond- 
ent comprehension may be missed in the 
usual pre-test, it still may be asked if this 
error contributes to any inaccuracy in poll 
results. From this study, the answer seems 
fairly clear. Four questions contributed by 
far the most to the total amount of mis- 
comprehension. In two of these questions, 
there was a marked and statistically signifi- 
cant tendency for those not comprehending 
to reply “don’t know.”” On one question there 
was a significant tendency for the non-com- 
prehending respondent to answer “approve” 
to a question dealing with the United Na- 
tions. This institution was enjoying a high 
degree of popularity at the time this study 
was conducted, and hence likely to elicit a 
favorable stereotype from a respondent hear- 
ing the words “United Nations” but mis- 
comprehending the intent of the question. 
In this instance, the question was inquiring 
about placing atomic energy under UN con- 
trol. There was no tendency evident for 
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Table 1 


Relation Between Readability and Comprehensibility 
of Nine Opinion Poll Questions 





Estimated Reading Grade 
Placement 
(Flesch Score) 


No. of 
Miscompre- 
hensions 


5.8 1 
6.1 1 
7.2 14 
7.6 14 
8.5 14 
11.0 1 
12.8 14 
14.0 3 
17.2 11 


those miscomprehending the remaining ques- 
tion to reply differently from the rest of the 
sample. This in itself might be damning to 
that question. 

Because the sample of questions is small, 
and because several of the questions received 
equal miscomprehension scores, the correla- 
tion between comprehension and readability 
has not been presented here. This study 
was not designed to be a validation of the 
Flesch index; however, since there may be 
some interest in the relationship found, the 
Flesch score of each question and the num- 
ber of persons miscomprehending the ques- 
tion are presented in Table 1. The num- 
ber miscomprehending the questions included 
those making wrong and partially wrong in- 
terpretations. 


Summary 


From the results noted here, it would 
seem that conventional pre-testing fails to 
uncover many questions that are later mis- 
interpreted by respondents in the main sur- 
vey. And it would seem that the failure to 
word some questions so as to bring respond- 
ent comprehension to a maximum may re- 
sult in distortion of the survey results. 
Hence, a few extra minutes spent gaining 
some rough measure of comprehensibility of 
the questions may well pay ample dividends 
in increased survey accuracy. 


Received May 14, 1952. 
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Most of us have had the experience of 
being called upon unexpectedly to give an 
opinion about some question, state a course 
of action, or criticize some proposal in an 
intelligent manner. It is possible that in 
such situations we made replies that we later 
recognized as missing the point, as not fully 
expressing our position, or that would have 
been more valuable if we could have thought 
of this, that, or the other alternative. It is 
conceivable that a large proportion of re- 
spondents to the typical opinion poll find 
themselves in a similar position. The re- 
spondent may give a forced answer to the 
persistent probing of the interviewer. How- 
ever, after the interviewer has gone these re- 
spondents may recall many pertinent bits of 
information or opinion that would clarify, 
amplify, or even change their original posi- 
tion. These additional remarks on the part 
of the respondent should be of some interest 
in the analysis of opinion. 

This study was undertaken to determine 
the effects of forewarning the respondents 
of a “typical” opinion poll of the purpose and 
nature of the approaching interview. 

It is hypothesized that forewarning by 
means of an introductory letter will give the 
respondent an opportunity to think about 
and discuss the various topics listed in the 
letter and so be prepared to give more de- 
tailed and thought out answers than he would 
with no such opportunity. It is also hy- 
pothesized that by forewarning, the respond- 
ent will be more prepared to cooperate with 
the interviewer and therefore make the in- 
terview more enjoyable, both from the in- 
terviewer’s and the respondent’s point of 
view. 

* This study is part of a dissertation submitted in 
partial fulfillment of the requirements for the degree 
of Doctor of Philosophy at the Pennsylvania State 
College. The study was.completed while the investi- 


gator was a fellow in psychology of the Britt Foun- 
dation. 


Method 


Two community surveys, one in Altoona 
and the second in Williamsport, Pennsy]l- 
vania, were conducted to test these hypothe- 
ses. The samples were drawn from the most 
recent city directory on an every nth dwelling 
unit basis. One of these sub-samples, com- 
prising 60 per cent of the total sample in 
each city, was designated for the sending 
of the forewarning letter. Letters were sent 
to more than one-half of each sample to 
allow for the normal number of substitutions 
and refusals. The letters were sent so as to 
be received at least three days before the 
interview. 

The forewarning letter read as follows: 


Dear Residents of Altoona: 


Many things, both big and small, are impor- 
tant to you in deciding whether or not a city is a 
good place in which to live. In an effort to make 
Altoona a better place in which to live, a study 
is being made by Pennsylvania Surveys at the re- 
quest of the Altoona Chamber of Commerce. 

To meet this aim it is important for you, as 
residents of Altoona, to speak your thoughts and 
opinions on several topics of community interest. 
Only you and your neighbors can paint a true pic- 
ture of your city. We feel sure that you will co- 
operate to help make this project a success. 

On Tuesday, November 28, a representative of 
Pennsylvania Surveys will call on you at your 
home. He, or she, will ask: 

About the transportation service, within AlI- 

toona, and into and out of Altoona. 

About business and industry in Altoona. 

About services provided by the city govern- 

ment of Altoona. 

About the amount and kind of public recrea- 

tion available in Altoona. 

About the housing situation in Altoona. 

About the public schools. 

For over-all suggestions that would make AIl- 

toona a better place in which to live. 


We realize that you are concerned with many 
problems other than those of a purely local in- 
terest. Therefore, we have sent you this letter 
so that when the representative calls you will 
have had some time to think about these prob- 
lems. We hope you will think about these top- 
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ics and talk about them with the members of 
your family before Tuesday. 

To get a clear picture of the wishes of the peo- 
ple of Altoona it is important that you do not 
talk about these topics with your friends until 
after the interview. If you discuss these topics 
‘with people outside your home your opinions 
might be biased by what the other people think. 

Your personal replies will be held in the strict- 
est confidence by us at Pennsylvania Surveys. 
What you say will never be identified to any 
other resident of Altoona or to any official of 
Altoona. The only information we give out is, 
for example, the number of people in a hundred 
who have a given opinion on an issue. We never, 
under any condition, tell who it is that has a 
given opinion. 

We wish to thank you again for your coopera- 
tion. 


Very truly yours, 
PENNSYLVANIA SURVEYS 


The letter sent to the respondents in Wil- 
liamsport was identical with the exception 
of the listing of certain of the topics. The 
letters were mimeographed on the letterhead 
of Pennsylvania Surveys. 

The questionnaire used in each city con- 
tained nineteen opinion questions. With only 
minor exceptions the questions used in both 
cities were identical. Included were open- 
end questions, dichotomous and other choice 
questions, and a rating scale. 

Because inclement weather was the rule 
during the period of interviewing (December 
5, and 12, 1950), a number of make-up inter- 
views had to be obtained in each city. In 
all, 527 out of a possible 600 interviews were 
obtained in Altoona, and 479 out of 500 in 
Williamsport. The unfilled interviews were 
randomly scattered throughout each city. 

Forty-two interviewers were used in Al- 
toona and thirty-five in Williamsport. The 
interviewers, for the most part advanced 
college students, were selected because of 
their expressed interest in opinion research. 
They were given detailed oral and written 
instructions on the methods of interviewing 
and on problems connected with this particu- 
lar survey. 


Results 


A rather disconcerting finding was the low 
claimed readership of the letter and the even 
smaller proportions of respondents receiv- 
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ing letters who reported knowing of the sur- 
vey topics and who discussed the letter with 
members of their family. Only 68 per cent 
of the forewarned respondents in Altoona 
and 63 per cent in Williamsport reported 
prior knowledge of the survey. And of all 
the forewarned respondents, only 28 per cent 
in Altoona and 27 per cent in Williamsport 
reported understanding the topics to be cov- 
ered. Nineteen per cent of the forewarned 
in Altoona and 16 per cent in Williamsport 
reported discussing the topics with their 
family. 

There is some evidence that the above 
figures underestimate the letter’s effect. It 
is known from the behavior of the respond- 
ent that in some instances the letter was 
read but not reported. It is also known that 
other respondents knew of the topics to 
be covered, although they disclaimed such 
knowledge. The respondents not reporting 
knowing of the survey were generally in the 
lower socio-economic levels, tended to have 
only a grade school education, and were em- 
ployed in unskilled occupations. 

The results of the analyses by type of re- 
sponse are presented in Table 1. For the 
sake of brevity only the total number of re- 
sponses of a given type to the total of all 
the questions is presented. The forewarned 
group of respondents includes those who re- 
ported knowing of the survey as well as 
those who said they did not. No-answer re- 
sponses have been eliminated as have the 
“don’t know” responses, except for the analy- 
sis of the number of “don’t knows.” 

From Table 1, it is seen that when the 
number of “don’t know”’ responses given to 
open-end questions was analyzed there was 
no difference between the forewarned and 
non-forewarned respondents in Altoona, but 
there was a significant reduction of “don’t 
know” responses given by the forewarned re- 
spondents in Williamsport. However, of 13 
open-end questions analyzed in Williamsport 
only two showed a significant reduction of 
“don’t know” responses among the fore- 
warned respondents, and one of these ques- 
tions covered a topic not listed in the fore- 
warning letter. 

It is seen that in neither city was there a 
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Table 1 
An Analysis of Various Types of Response to the Questions 








Response Type 


Total Responses 


Response Type 





Altoona: “DK” responses to open-end questions. 
Forewarned . 
Non-forewarned 
Williamsport: “DK” responses to open-end questions. 
Forewarned 
Non-forewarned 
Altoona: “DK” responses to choice questions. 
Forewarned 
Non-forewarned 
Williamsport: ““DK” responses to choice questions. 
Forewarned 
Non-forewarned 
Altoona: Stereotyped responses. 
Forewarned 
Non-forewarned 
Williamsport: Stereotyped responses. 
Forewarned 
Non-forewarned 
Altoona: Response reality. 
Forewarned 
Non-forewarned 
Williamsport: Response reality. 
Forewarned 
Non-forewarned 
Altoona: Response extremity. 
Forewarned 
Non-forewarned 
Williamsport: Response extremity. 
Forewarned 
Non-forewarned 


N N 


1,867 
2,394 522 
1,753 
2,236 


342 
502 


4,896 
6,230 


319 
393 


3,789 
4,979 


275 
384 


1,907 
2,351 


1,002 
1,327 


1,737 
2,121 


860 
1,083 


1,343 
1,636 


187 
224 


1,116 
1,382 


151 
246 


1,965 
2,534 


1,571 
2,061 





reduction of “don’t know” responses given to 
the choice questions or to the rating scale. 
The responses to open-end questions were 
rated for stereotypy. An answer was consid- 
ered to be sterotyped if it was judged to be 
an easy answer to give and if it required little 
thought on the part of the respondent. There 
was a significant tendency for the forewarned 
respondents in Altoona to give fewer stereo- 
typed responses. However, this difference 
was not confirmed in the Williamsport sample. 
The responses were also rated for the de- 
gree of non-reality that was manifested. A 
reply was considered to indicate non-reality 
if it suggested an obviously impossible solu- 


** Difference significant at the 1 per cent level of confidence. 


tion to a problem, indicated that the respond- 
ent was dodging an issue, or that was so 
obviously socially wrong that it could never 
be implemented. 

It is seen that there was no difference be- 
tween the number of non-reality responses 
given by the forewarned and non-forewarned 
respondents in Altoona. However, in Wil- 
liamsport there was a significant reduction of 
non-reality answers given by the forewarned 
respondents. 

It was hypothesized that forewarning would 
lead the respondent to be more sure of his 
opinion and hence be willing to endorse a 
more extreme statement of opinion than the 
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non-forewarned. As is noted in the table, 
this hypothesis was not confirmed in either 
city. 

In both cities there was only a negligible 
tendency for the forewarned respondents to 
give more answers to the open-end questions 
than the non-forewarned. In neither city was 
the difference close to being significant. 

The forewarned group in the preceding 
analyses contained respondents who did not 
report receiving the letter, and those who 
reported not understanding the topics. To 
further test the effects of forewarning, those 
respondents in Williamsport who claimed to 
have received the letter and to have under- 
stood its meaning were selected out of the 
over-all forewarned group. A sample of re- 
spondents was drawn from the non-fore- 
warned group to match as completely as 
possible the informed-forewarned in respect 
to age, sex, socio-economic status, and educa- 
tional attainment. These two groups were 
then analyzed on the same variables as dis- 
cussed above. Only those questions cover- 
ing topics that were mentioned specifically in 
the forewarning letter were included for 
analysis. 


If there was any marked tendency for 


these informed-forewarned respondents to 
change their responses on the basis of the 
opportunity to discuss and think about the 
topics then it should turn up in this matched 
sample analysis. However, in no instance 
were the hypotheses verified. That is, there 
was no tendency for the informed-forewarned 
group to give more responses to open-end 
questions, fewer “don’t know” responses, less 
stereotyped or fewer non-reality replies, or 
to accept more extreme statements of opinion. 

Returning once more to the full samples, 
the hypotheses concerning respondent co- 
operation were analyzed next. Because some 
of the interviewers failed to record refusals, 
it was impossible to determine the effects of 
forewarning upon the refusal rate. How- 
ever, with data obtained from the original 
sample listings it was possible to analyze the 
rate of substitution at forewarned and non- 
forewarned addresses. A difference of six 
per cent was obtained in the direction of 
fewer substitutions at addresses to which let- 
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ters had been sent. This difference was sig- 
nificant at the 10 per cent level of confidence. 

Interviewer ratings of the respondent’s 
cooperativeness, eagerness to discuss the ques- 
tions, and apparent information showed dif- 
ferences in favor of the hypothesis. In both 
Altoona and Williamsport the forewarned 
respondents were rated significantly more 
cooperative than the non-forewarned. In 
neither city was a significant difference found 
in respect to the respondent’s eagerness to 
discuss the questions, but the differences that 
did exist were in the predicted direction. The 
interviewers did not rate the Altoona fore- 
warned respondents as being more informed; 
however, there was a significant difference in 
favor of the forewarned group in Williams- 
port. 


Discussion 


When considering the questions individ- 
ually it was found that the number of sig- 
nificant differences in this study could have 
been found on the basis of chance alone. 
Secondly, the differences that were found were 
not consistently in the predicted direction, 
nor were they consistent between the two 
cities. The matched-sample study, which 
tested the hypotheses under the most rigor- 
ous conditions, did not disclose any differ- 
ences that would uphold the hypothesis that 
the forewarning letters lead to more meaning- 
ful or a greater number of responses. 

The general lack of effectiveness of the 
forewarning letter may have resulted from 
several uncontrolled variables. It is possi- 
ble that the forewarned respondents, if probed 
more completely or given more time to give 
their answers, would have given more re- 
sponses and responses with more meaning. 
It is also possible that the older and stronger 
opinions, those most likely to be stereotyped, 
would be given first. If the interviewers did 
not record all the answers or were in a hurry 
to get to the next question the effects of 
forewarning might be nullified. However, 
while this may all be true, it is held that 
the interviewers were well motivated, did a 
competent job, and were comparable in ability 
to the typical interviewer used in many 
market research studies. 
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Another factor that might account for the 
negative findings is the letter. If the letter 
had not been mimeographed, but rather had 
been made more attractive or had spelled out 
the topics in a simpler or more understand- 
able fashion, the respondents might have 
been more motivated to read the letter and 
take the suggested action. On the other 
hand, the letter had been pretested on a 
small sample in Altoona and checked for 
readability. Short sentences, short words, 
and large type were used, hence the letter 
should have been understandable to any 
person who could read a newspaper. While 
a more attractive letter might have secured 
more readers, the results of the matched- 
sample analysis showed readers to respond 
no differently from the non-forewarned. 

The negative findings might have resulted 
because of the nature of the survey content. 
The respondents might have been more in- 
clined to think about and discuss the na- 
tional administration or the most recent base- 
ball trades, rather than purely local issues. 
Here we may have the most logical explana- 
tion of the findings. This study may well 


have served to point out once again the pub- 


lic’s indifference to civic affairs. In many 
of our cities well-documented exposés of civic 
maladministration or the pressing need for 
certain improvements fall on an unrespon- 
sive public. It may not be too far-fetched to 
believe that the forewarning letter met a 
similar fate. 

The interviewer ratings are subject to the 
criticism that they were made after deter- 
mining if the respondent had received a fore- 
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warning letter. Nevertheless, there is some 
subjective evidence that tends to uphold the 
general validity of the ratings. 

The interviewers either volunteered or were 
asked whether or not they had the feeling 
that the forewarning letter made any differ- 
ence in the respondents’ cooperativeness. 
Many of the interviewers reported that they 
felt the letter did help in securing rapport 
and no interviewer reported that the letter 
made the respondent more suspicious or un- 
cooperative. The interviewers claimed that 
they could predict the forewarned respondent 
with some accuracy before asking for knowl- 
edge of the survey. Moreover, the inter- 
viewers were not told the purpose of this 
study. They knew that some respondents 
would know of their coming; however, they 
were led to believe that this was primarily 
a check on their honesty in meeting the as- 
signment. Therefore the hypothesis of in- 
creased cooperation would have to be arrived 
at individually during the course of the in- 
terviewing period. 

If neither of these lines of argument vali- 
dates the assumption of increased respondent 
cooperation it might be further argued that 
the question of the validity of the ratings is 
unimportant. If interviewers believe that 
forewarned respondents are more coopera- 
tive it makes very little difference whether 
they truly are more cooperative or not. It 
would seem that forewarning by mail can 
be an effective factor in making interview- 
ing a more pleasant occupation, and that it 
can be done fairly inexpensively. 


Received June 16, 1952. 
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Color psychotherapy has aroused new in- 
terest in recent years and efforts have been 
made to re-establish the reputation of this 
declining field. Some writers have suggested 
that color vision is influenced by emotional 
states. Kravkov (3) has found that, under 
adrenergic influence, the retina is more sensi- 
tive to blue-green and less sensitive to red- 
orange. 

From such studies it has been inferred by 
some that colors, as sensations (apart from 
symbolic content), are influential in produc- 
ing states of emotion. In a recent popular 
book, Color, psychology and color therapy, 
Birren (1, p. 150), draws upon such research 
to conclude: “To state a principle, it seems 
that the immediate action of any color stimu- 
lation is followed in time by a reverse effect. 
Red increases blood pressure, which later be- 
comes normally depressed. Green and blue 
decrease blood pressure and later cause it to 
RP 

Birren relies on the work of Goidstein (2) 
for this generalization. He calls attention to 
Goldstein’s observation (1, p. 149): “One 
could say red is inciting to activity and 
favorable for emotionally-determined actions; 
green creates the condition for meditation 
and exact fulfillment of the task. Red may 
be suited to produce the emotional back- 
ground out of which ideas and actions will 
emerge; in green these ideas will be de- 
veloped and the actions executed.” 

One might inquire concerning the basis on 
which Goldstein formulates these “principles.” 
A report of his research is found in an article 
appearing in Occupational Therapy and Re- 
habilitation (2). Inasmuch as he merely 
refers to the research and describes neither 

*From research conducted at Fort Hays Kansas 
State College. The author is indebted to Dr. J. T. 
Naramore, Supt., and Alexander J. Robinson, Clini- 


cal Psychologist, Larned State Hospital, for their 
help in this project. 
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procedure nor data, it is impossible to de- 
termine how he arrived at his conclusions. 
The general nature of his findings are: any 
activity which takes place under red light 
or in which red equipment is used will tend 
to be performed in a more emotional manner 
whereas activity engaged in under green light 
or with green equipment will be “thoughtful” 
in nature. He describes an experiment (no 
data included) in which it was demonstrated 
that a subject with arms extended in front 
would, when illuminated with red light, tend 
to move his arms outward. If illuminated 
with green light, he would tend to draw the 
arms together in front of the body. He also 
discusses the influence of colored light or 
colored ink on handwriting. He found: 
“Words written in red ink or green ink (if 
the patient pays attention to the color) show 
different size of letter and different distances 
between the letters. Handwriting in green 
light or with green ink is much more similar 
to the normal handwriting than that in red 
light or in red ink” (2, p. 149). 

Contrary findings are reported by Vollmer 
(9). He is unable to verify that arms of 
the subjects held forward and parallel deviate 
toward red and away from blue light. 

Lukens and Sherman (4) found that the 
use of red, black or white materials by pa- 
tients in weaving produced no differential 
results in woven objects. 

In view of the inconclusive and conflicting 
nature of the evidence on which much of 
the contemporary opinion concerning color 
therapy is based, fundamental, planned ex- 
periments are necessary. ‘The present article 
reports such an experiment. 


The Experiment 


A total of 132 subjects were used. Of 
these, 66 were college students and 66 were 
patients in the State Mental Hospital at 
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Larned, Kansas, all classified as psychotic, 
psychoneurotic, or psychopathic personality 
and all in a state of remission suitable for 
occupational therapy and engaged in occu- 
pational therapy programs. 

Each subject was asked to write the fol- 
lowing statement in each of three colored 
inks, red, green, and black, and with a pen- 
holder which corresponded to the ink color. 


Dear Joe, We received your letter and 
expect to see you next week, (signature) 

Subjects were asked to write this state- 
ment on a sheet of white paper, 5% x 8&¥, 
inches. Their attention was repeatedly di- 
rected to the fact that different ink colors 
were being used. 

The particular statement was selected after 
preliminary experimentation, since it met the 
following requirements: (1) it was of such 
length that the average writer would not be 
tempted to cram it onto one line but could 
easily write it on two lines (length of the 
material should not influence choice of size 
or form of handwriting); (2) it was not so 
long that fatigue would be introduced; and 
(3) it was symbolically as meaningless as 
possible yet retained literary form. 

The two major groups were subdivided into 
six sub-groups of 11 normals and 11 abnor- 
mals, in order to equalize the effect of the 
order in which the different colors were used. 
Group I used inks in the order R BI G; 
Group II, R G Bl; Group III, G R BI; 
Group IV, G Bl R; Group V, BI R G; and 
Group VI, BI G R. 

This design supplied a total of 396 hand- 
writing samples, 132 in red ink, 132 in black 
ink and 132 in green ink, one third of each 
ink color having been written first, one third 
second and one third last in the series. 

Handwriting samples were measured on 
a millimeter scale, and means were deter- 
minded for each sub-sample. Variance esti- 
mates of the sub-groups and major groups 
were made. 


Results 


Sub-sample, border, and total sample means 
are given in Table 1. Color (column) means 
do not differ appreciably, nor do the order of 
writing (row) means. However, means for 


Table 1 


Means of 18 Groups of 22 Samples Each (nk equals 
396) of Handwriting Classified According to: (1) Ink 
Color in Which Written; (2) Order in Which Ink is 
Used; and (3) Psychiatric Classification of Writer 

(Measurements in millimeters) 


NP Color of Ink 


Classiti Order 


cation Red Green Black Means 


{1 Normal 
NP 


Order of } 2 Normal 


) 
| 
| 
} 


20.9 
26.9 


22.4 1.7 21.7 
26.5 7.1 


1.1 
writing NP ' 71 


3 Normal 22.2 
NP 26.4 


21.7 
26.7 


Color Normal 
means | NP 


21.6 
26.9 


21.8 
26.8 


Analysis of variance 
Variance 
Estimate 


_ 


Source 
Ink Color 
Order of Writing 
NP Classification 
Color/order 
Color/NP Class 
Order/NP Class 
Order/Color/NP 
Individual Diff. 


af 


w 

y 

cet nNwnrs & — Nw Ww 
euuns tt 
Son WN A 


9.66 


Order 


2 NP Class 
Ind. Diff. 


_— = 364 
Ind. Diff 


Color 
* Ind. Diff. 


normal and for psychiatric groups differ in 
every instance. 

F test reveals that variance ratios in every 
instance are such that they would be ex- 
pected by chance, except in the instance. of 
differences between normal and abnormal 
groups. These differences are significant at 
the .0S level of probability. Variations not 
due to difference in psychiatric classification 
are due to individual differences. F's are 
so small as to leave no doubt that the hy- 
potheses must be accepted that differences 
due to color of ink used, order in which the 
sample is written, interaction between color 
and order, interaction between order and 
psychiatric classification, interaction between 
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ink color and psychiatric classification, and 
interaction between ink color, order of writ- 
ing and psychiatric classification are those 
which might be expected by chance from a 
random sample of handwritings. 

It would be interesting to discover what 
it is that contributes to the significant differ- 
ences which exist between normal and psy- 
chiatric handwriting samples. However, the 
design of the present experiment does not 
permit inquiry into this matter. 


Summary 


Color of ink employed in handwriting has 
no influence on the size of the handwriting. 

Popular concepts concerning the influence 
of colored equipment or colored lights on 
motor performance (and possibly on emo- 
tional affect) must be revised until or unless 
more substantial evidence is uncovered to 
support these ideas. Nothing in the present 
experiment supports occupational therapy 
based on the influence of single colors. 


Received December 13, 1952. 
Early publication. 
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In using the method of paired comparisons, 
McCormick and his students (2, 3) have 
drawn attention to the feasibility of using 
partial pairing, as opposed to complete pair- 
ing, and have reported its use relative to the 
rating of employes. The partial pairing tech- 
nique should, under any circumstance, result 
in the abbreviation of the time required for 
the preparation, rating, and scoring of pairs 
in proportion to the extent that pairing is 
partial rather than complete. 

In a previous article (1) a procedure was 
discussed for the use of punched card equip- 
ment to facilitate rapid preparation and scor- 
ing of a complete pairing deck in accord with 
the traditional use of the paired comparison 
technique. This procedure can be tenably ap- 
plied with equal facility to prepare a partial 
pairing deck. It should be particularly useful 
in cases where the method is used with N’s of 
15-20 or greater. 


Partial Pairing 


If N is an even number, the minimum num- 
ber of pairs needed in a partial pairing deck 
is that required to give each of N individuals 
opportunity to receive at least one choice. 
The minimum number of pairs needed when 
N is an odd number, however, is that re- 
quired to give each of NV individuals oppor- 
tunity for two choices. The composition of 
such a minimum partial pairing deck has 
been described by what Kephart and Oliver 
(1) have arbitrarily termed “set.” Departure 
from complete pairing is conditioned by the 
number of “sets” incorporated in a_ partial 
pairing deck. The number of “patterns” (2) 
that may be used with any particular .V is 
the number of possible combinations of “sets.” 
In this respect, of course, the inclusion of all 
sets results in complete pairing. 

If N is an even number, there are .V/2 sets 
in a complete pairing deck. As an example, 
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consider an N of 6, permitting numbers to 
represent names being paired. 


Set 1 Set 2 Set 3 
1-2 
2-3 
3-4 
ae 


5 
6 


» 
6 
1 


| DESTROY 
( 


1 
2 


6 ) 

One-half of set 3 is destroyed since each 
half contains the same three pairs and is 
extraneous to a complete pairing deck. The 
three remaining pairs in set 3 give each of 
the six individuals opportunity for one choice. 
Set 1 and set 2, each composed of 6 pairs, 
give each of the 6 individuals opportunity for 
2 choices. Therefore, we can pair everyone 
with one other individual by using only set 
3, everyone with 2 other individuals by using 
either set 1 or 2, everyone with 3 other in- 
dividuals by the combined use of set 3 with 
either 1 or 2, everyone with 4 other individ- 
uals by the combined use of set 1 and 2, or 
complete pairing by the use of all three sets. 
The small N of 6 is used for illustrative pur- 
poses only, but the same principle is opera- 
tive for an even N of any size. For ex- 
ample, if N were 50, we would have 25 sets. 
Pairing can be made partial in multiples of 
1, and the extent to which it is partial is de- 
termined only by the number of sets incor- 
porated in the final deck to be used. 

If N is an odd number, there are (V 
—1)/2 sets. As an example, consider an 
of 7, again permitting numbers to represent 
individual names in the pairs. 

Set 1 
1-2 
2-3 
3-4 
4-5 
5-6 
6-7 
7-1 


Set 2 
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Although one-half of the last set is al- 
ways destroyed when WN is even, this is not 
characteristic when N is odd. Each of the 
three sets above consists of 7 pairs, and 
gives each individual opportunity to receive 
2 choices. We can, therefore, incorporate 
either set 1, 2, or 3 into a partial pairing 
deck and have everyone paired with 2 other 
individuals, or use any two sets to pair 
everyone with 4 other individuals. The use 
of all 3 sets results in complete pairing. 
Therefore, when N is odd, pairing can be 
made partial in multiples of 2, and the extent 
to which it is partial is determined by the 
number of sets incorporated in the deck. 


Summary 


The method of paired comparisons has 
long been considered somewhat laborious to 
say the least. In a previous article (1) a 
punched card procedure was outlined to facili- 
tate rapid preparation and scoring of the 
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pairs as the method has been traditionally 
used. The discussion of the punched card 
procedure has here been extended to draw 
attention to its applicability to partial pair- 
ing, a technique to further abbreviate time 
and labor requirements in preparing, rating, 
and scoring the pairs. The procedure is 
systematic and may be used with any num- 
ber of variables. 


Received June 7, 1952. 
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Accuracy of dial reading and the condi- 
tions of which it is a function constitute a 
problem of interest to those psychologists 
who are concerned with display problems. 
The facts of the relationship between ac- 
curacy and its determinants have important 
applications to industrial and military situa- 
tions. Reviews of the previous work accom- 
plished have been presented in several sources 
(1, 3, 5, 8). 

Although a considerable amount of experi- 
mental effort has been expended in this area, 
little attention has yet been paid to the 
specific questions with which the present study 
was concerned. In general, this experiment 
attempted to determine the relationship be- 
tween the accuracy of reading and the dial 
sector and specific location of the dial pointer. 

Kappauf and Smith (7) found that the 
sector had no consistent effect on either local 
errors or systematic errors for many dials, 
but sector location may influence the occur- 
rence of specific systematic errors on certain 
scales. Dials graduated from 0 to 50 and 
0 to 100 revealed an error more prevalent on 
right dial halves than on left halves on scales 
numbered by tens. 

Christensen (2) studied exposure time as a 
factor in dial reading performance. Moving 
scale dials were better at short exposures 
while moving pointer dials were better at long 
exposures. Sleight (9) compared dial shapes 
for legibility. In the order of accuracy of 
readings the dials ranked as follows: (1) 
open-window; (2) round; (3) semicircular; 
(4) horizontal; and (5) vertical. 

In a study of instrument recording per- 
formance under varied illuminating condi- 
tions, Spencer (10) reported readings most 
accurate at the 12 o'clock sector of the dial, 
but his results were not consistent. In stud- 


ies of check reading of fixed-scale, moving 
pointer instruments, Warrick and Grether 
(11) and Grether and Connell (4) reported 
more frequent correct responses when the 
index is at the 9 o'clock position than when 
it is at the 3 o'clock position. 

In a study of the effect of pointer design 
and pointer alignment position on speed and 
accuracy of instrument readings, White (12) 
had his subjects make a qualitative reading 
of the deviation from vertical among 16 
simulated engine instruments in order to 
make a correction. Alignment at the 9 
o'clock position was superior for qualitative 
reading. In another experiment his subjects 
had to check-read a panel of simulated in- 
struments with pointer alignment at the 9, 
12, 3, and 6 o'clock positions and indicate 
misalignment. No significant differences in 
response time and errors were found. Hor- 
ton (6) found an increase in the frequency 
of systematic errors with sector errors being 
more than twice as frequent on the left half 
of the scope as on the right. In an un- 
published study from this laboratory it was 
found that fewer errors were made at and 
around the 9, 12, and 3 o'clock positions and 
more errors were made at some intermediate 
points in a circular dial. Our results in this 
respect were not entirely consistent, how- 
ever, because in both groups there were mid- 
division settings which were not numbered. 

From the literature cited several findings 
are of particular interest in connection with 
the present study. Kappauf and Smith (7) 
found that sector had no consistent effect. 
When reversal errors were frequent, sector 
was then observed to be important. Spencer 
(10) reported more accurate readings in the 
12 o’clock sector, but his results were not 
consistent. White (12) found that the 9 
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o'clock position was superior for reading of 
deviations from vertical. Finally, we ob- 
served a tendency for fewer errors at and 
around the 9, 12, and 3 o’clock positions. 

Three dial shapes were used in the present 
study in an attempt to answer certain ques- 
tions which can be raised concerning the in- 
fluence of sector and pointer location on 
accuracy of reading. These dials are: (A) 
semicircular upright dial; (B) semicircular 
inverted dial; and (C) circular dial (see Fig- 
ure 1). 

The following specific questions were asked 
concerning accuracy of reading the three 
dials: 

1) Are errors in a particular quadrant a 
function of the dial shape in which the quad- 
rant occurs? 








Fic. 1. The dial shapes used in the experiment. 
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2) Are intra-dial errors for Dial C a func- 
tion of the quadrant in which readings are 
made? 

3) Are intra-dial errors for Dial C related 
in a systematic way to pointer positions of 9, 
12, 3, and 6 o’clock compared to intermediate 
positions? 

4) Are errors a function of the dial half 
(upper and lower) in which readings are 
made? 


Method and Procedure 


Subjects: The subjects used in the experi- 
ment were eight male and two female uni- 
versity students. They ranged in age from 
20 to 30 years. Each subject had a minimum 
Snellen index of 20/20 (corrected or uncor- 
rected) in each eye. 

Apparatus: The apparatus used to present 
the dial settings to the subjects was a modi- 
fication of the Dodge tachistoscope. The in- 
terior was painted black, and the subject 
viewed a single dial through a binocular eye- 
piece. The pre-adapting illumination and 
the presentation illumination were provided 
by two pairs of 25 watt bulbs. The distance 
from the subject’s eyes to the test dials was 
42 in. 

An electronic interval timer was used to 
present exposure periods which were set at 
0.1, 0.3, 0.4, 0.5, and 0.7 sec. 

The dials were constructed to follow the 
design characteristics suggested in “Stand- 
ards to be Employed in Research on Visual 
Displays,’ Armed Forces-NRC, Vision Com- 
mittee, 1 March 1950. All characteristics of 
the three dials were held constant as shown 
below: 

1. All numbers were made by India Ink on 
white cardboard using a No. 3 pen and a 
LeRoy lettering guide. 

2. The diameter of each dial was 2% in. 

3. The distance between graduations along 
the circumference of the scale was 34 in. and 
the length of each graduation unit was *4¢ in. 

4. The height of each numeral was 5%» in., 
and the stroke width was approximately 
Vo in. 

5. The O setting for each dial was at 9 
o'clock and the 10 setting was at 3 o'clock. 
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Table 1 


The Error Scores for the Three Dials ‘Tested 


Dial A 


Dial B 


Whole QI QII 


Subject 


Whole QI QIV. I 


Upper Lower 


i 
falf 


Dial C Inter 
mediate 


Points 


Car- 
- dinal 


Half = QL «QUE QUI QIV _ Points 





ob 11 
3 9 
10 15 
10 
10 
9 
10 
11 
3 
21 


w 


_— 


— 
aK www Vs 


- 
~_ 
anor Oe NUE Hw 


6. The pointer was !%4¢ in. long and was 
l4 in. wide. 

Each dial was mounted on stiff black card- 
board. The settings on the test dial were 
manipulated by means of a larger dial placed 
on the reverse side of the test dial. Thus, 
settings on each dial could be quickly and 
conveniently changed. 

Procedure: After the subject was seated 
before the binocular eyepiece of the tachisto- 
scope, a dial was exposed for an unlimited 
exposure. The subject was shown the dial, 
and its units and graduations were pointed 
out. The subject was shown several settings, 
and was told that he would be required to 
report the pointer position. The pointer was 
set either on the graduation marker or mid- 
way between two graduation markers. The 
subject was also shown the dial under the 
conditions of timed exposures. The experi- 
menter called “ready” when a trial was to 
be started, and the click of the interval timer 
signaled the starting of the timed exposure. 
The subject reported “11,” “9%,” “6%,” 
“17,” etc. Dials A and B had 21 possible 
settings while Dial C had 40 possible settings. 

The order of presentation of dials, the set- 
ting on each dial, and the time interval were 
systematically varied in order to handle possi- 
ble practice and fatigue effects. Dial A and 
Dial B were each presented 105 times for 
each subject involving the 21 different dial 
settings and the five time intervals tested. 


27 11 1413 
21 8 14 7 
18 11 7 1 2 10 
10 13 7 3 0 4 
19 10 «11 8 0 8 
& 
7 
1 


0 6 
0 2 


25 13 17 1 11 
24 16 1 13 
21 5 1 ( 11 
9 3 i) 6 
10 9 6 


Dial C was presented a total of 200 times to 
each subject. Thus each subject made 410 
judgments involving the three dial faces 
tested, and the results presented are based 
on a total of 4,100 judgments. 


Results and Discussion 


The error score for a given individual for 
any set of dial readings was found by sum- 
ming twice the deviation from the actual 
setting. Thus, score =X (2E), where E is 
the deviation of the subjects reading from 
the actual dial setting. Each deviation was 
multiplied by two simply to eliminate deci- 
mals. These results are shown in Table 1 
for each of the three dials tested. The table 
shows the total error score for each individual 
for comparable sections, dial-wise or quad- 
rant-wise, for Dials A, B, and C. In addi- 
tion, for Dial C the total error score is shown 
for cardinal settings (0, 5, 10, and 15) and 
for intermediate settings (2, 3, 7, 8, 12, 13, 
17, and 18). The quadrants referred to are 
designated as follows: (1) upper-right, (II) 
upper-left, (IIL) lower-left, and (IV) lower- 
right. 

For any statistical test of significance a 
difference score was found for each individ- 
ual. The standard error was then computed 
from the distribution of differences, thus al- 
lowing for the correlation among individuals. 
The results of the tests of significance (¢ test) 
are shown in Tables 2, 3, and 4. 
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Table 2 


Significance of Difference Between Error Scores in 
Comparable Quadrants of Dials A, B, and C 








Quadrant Dial Comparison tvalue 





I A vs. ¢ 0.62 

Il A vs. C 0.45 
e 1.73 
.¢ 


0.01 


B vs. 
IV B vs 


Table 3 


Significance of Differences Between Error Scores Made 
in Quadrants of Dial C Expressed as ¢-values 








Quadrant 


I 
II 


1.14 


IV 0.81 


Table 4 


Significance of Differences in Total Error Scores in 
Dials A, B, and C Expressed as t-values 


Dials 
A 

B 
C (upper half) 0.05 0.90 - 
C (lower half) 0.41 1.16 0.65 


c 
B (upper half) 


Four sets of ¢ tests were made relative to 
the four questions previously raised. The 
questions are restated here, as follows: 


1) Are errors in a given quadrant a func- 
tion of the dial shape in which the quadrant 
occurs? 

2) Are intra-dial errors (Dial C) a func- 
tion of the quadrant in which readings are 
made? 

3) Are intra-dial errors (Dial C) related 
in a systematic way to pointer positions of 
0, 90, 180, and 270 degrees compared to 
intermediate positions? 

4) Are errors a function of the dial half 
(upper vs. lower) in which readings are 
made? 

From the results of the tests of significance 
shown in Table 2, it is clear that dial shape 


for the three dials used in the experiment has 
not been demonstrated to be an important 
factor in reading accuracy when the errors 
produced are considered on a quadrant basis, 
since all of the ¢-values shown are insignifi- 
cant at the 5 per cent level of confidence. 

From the results shown in Table 3 dealing 
only with the errors produced in the circular 
dial (Dial C), it may be concluded that the 
quadrant from which the settings are read 
has not been demonstrated to be a significant 
factor in reading accuracy, since all of the 
t-values shown are insignificant at the 5 per 
cent level of confidence. 

The third major comparison to be consid- 
ered in the analysis of the data is the result 
of the comparison of error performance when 
errors made at dial settings 0, 5, 10, and 15 
are compared with errors made at settings 
2, 3, 7, 8, 12, 13, 17, and 18 for the circular 
dial (Dial C). The ¢-value here is 5.89 and 
the difference is significant at the .01 level. 
We, therefore, conclude that in the circular 
dial used in the study significantly fewer 
errors were made at the 9, 12, 3, and 6 o’clock 
positions than at the tested intermediate 
points. 

Table 4 shows the comparison of the ac- 
curacy of reading the upper half of Dial C 
with Dial A, the lower half of Dial C with 
Dial B, Dial A with Dial B, etc. Here again 
none of the ¢-values are significant at the 5 
per cent level of confidence. The results 
show that accuracy of reading in upper and 
lower dial halves does not differ significantly 
in the set of dials used in this study. 

One additional finding should be noted. 
The results of previous investigations (2) 
concerning the effect of exposure time were 
verified. Errors decreased as length of ex- 
posure time increased. 


Summary 


The purpose of this experiment was the 
determination of the relationship between ac- 
curacy of dial reading and the sector and 
specific location of the dial pointer. The 
three dials used were a semicircular upright 
dial, a semicircular inverted dial, and a cir- 
cular dial. Ten subjects made a total of 
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4,100 judgments at five exposure times on the 
three dials. 

Tests of significance for error scores were 
made and permitted the following conclu- 
sions: 

1) Differences in dial shape were not an 
important source of error. 

2) Differences in sector location of the 
dial pointer were not an important source of 
error. 

3) Significant differences in error scores 
were found for readings made at 9, 12, 3, and 
6 o'clock positions corresponding to pointer 
settings at 0, 5, 10, and 20 when compared 
with intermediate points. 

4) No significant differences in error scores 
were found when upper and lower dial halves 
were compared. 

These findings suggest that critical regions 
of a scale should be assigned to the 9, 12, 3, 
or 6 o'clock positions of a circular dial, and 
that factors other than errors may be con- 
sidered in the choice of a dial from among 
the three types studied here. 


Received May 22, 1952. 
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Dimensional Analysis of Motion: V. An Analytic Test of 
Psychomotor Ability * 


Shelby Harris and Karl U. Smith 


University of Wisconsin 


The present paper describes a new test of 
psychomotor skills, based on dimensional 
and component analysis of movements in mo- 
tion. This test, which has been named the 
Analytic Reactometer, permits separate and 
automatic registration of the travel and ma- 
nipulation components of motion involved in 
the successive grasping and manipulating of 
objects.2 This development in psychomotor 
testing is of considerable significance for 
several reasons: (1) the test provides more 
detailed and precise measures of the com- 
ponents of motion than has been previously 
possible; (2) it permits, within the same 
instrument, systematic variation of several 
dimensions of motion, such as extent of move- 
ment, direction of movement, extent of ma- 
nipulation, complexity of movement and ma- 
nipulation, plane of movement, hand involved, 
etc.; (3) it provides a means of analyzing 
errors of manipulation in terms of the vari- 
ous dimensions of motion and with regard 
to the component time scores; and (4) the 
principles involved in the test may be in- 
corporated in all types of psychomotor tests 
which may be designed to simulate various 
types of work situations. The desirability of 
a psychomotor test situation which will ac- 
complish the objectives named is indicated 
by the fact that different components and 
dimensions of movement in skilled motion 
patterns are functionally distinct (1, 2, 3, 5, 
6) and often uncorrelated. The test to be 
described has some significance for the field 
of personnel selection, but it is believed that 
the methods and results described herein are 
more immediately applicable to problems of 
detailed measurement of different types of 

1 This study has been supported by funds voted by 
the Legislature, The State of Wisconsin, and assigned 
by the Graduate School Research Committee, The 
University of Wisconsin. 

2The analysis of results of this study has been 


aided by the facilities of the Computing Service, The 
University of Wisconsin. 


human manual performance in medical and 
industrial research. 


Methods 


The Analytic Reactometer is designed in 
terms of two main features: (1) control of 
the space dimensions of the motion pattern; 
and (2) separate measurement of the ma- 
nipulative and travel components of motion. 

The planned performance situation used in 
the present form of the test is a control panel 
45.7 cm. square, on which are mounted 25 
rotary switches in 5 rows of 5 switches, each 
spaced 7.6 cm. apart (Figure 1). Each 
switch has 17 settings, selected points of 
which are marked as shown in Figure 1. 
The positions thus marked are 40°, 80°, and 
180° clockwise and 40° and 80° counter- 
clockwise. 

The manipulative and travel components 
of motion are measured separately in this 
test by means of an electronic motion ana- 
lyzer (4), consisting of a balanced relay cir- 
cuit, in which the subject acts as a key. 
When the subject touches one of the switches, 
the analyzer is activated and elapsed time is 
recorded on a precision time clock * in hun- 
dredths of a second until contact with this 
switch is broken. When the clock measur- 
ing manipulation time stops, a second clock, 
measuring travel time, starts, and continues 
to run until the next switch is touched. Thus, 
the elapsed time in operating any pattern of 
the switches is totalled separately for manipu- 
lation and travel movement components by 
means of the two clocks. 

The following types of scores may be ob- 
tained on the test: (1) time involved in turn- 
ing the 25 switches; (2) time of travel be- 
tween the 25 switches; (3) total time in- 
volved in both manipulation and travel; and 


8 Model S1, Standard Time Clock, Standard Elec- 
tric Time Company, Springfield, Massachusetts. 
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Diagram of Analytic Reactometer showing the arrangement of controls on the panel and the tim 
ing mechanism. The inset illustrates the design of each manual control. 


The special mount for the 


control panel makes it possible to position the panel in different planes. 


(4) errors made in positioning the switches. 
The reactometer permits testing of the per- 
formance of either hand, with different planes, 
directions, and magnitudes of movement. To 
vary these dimensions of motion, different 
settings and patterns of switches may be 


used or the control panel itself may be 
changed from one plane to another. The 
whole test is constructed to be easily trans- 
ported. 

The main objective of this study has been 
to analyze, in terms of correlation procedures, 
the interrelations between different reactive 
variables which typically enter into perform- 
ance on psychomotor tests. Specifically, the 
reliability of scores related to different di- 
mensions and components of motion, as per- 
formed in the test situation, has been de- 
termined. In addition, intercorrelations be- 
tween the components of motion and between 
tests involving different dimensions of mo- 
tion have been computed. 

Twenty tests were carried out in the study 
which covered the following aspects of mo- 
tion: (1) right and left directions of manipu- 
lation; (2) different directions of travel 
movement; (3) performance with each hand; 
(4) horizontal and vertical planes of mo- 
tion; and (5) simple and complex patterns of 
manipulation. All switches on the board 


were used in each test. In all of the tests 
the manipulative movement consisted of a 
40° rotation of the switch, either right or 
left to positions 1 or 5 respectively. The 
travel movement from switch to switch was 
horizontal (left to right) in some tests and 
vertical (downward) in other tests. Com- 
plex manipulation patterns differed from the 
simple ones in that alternate switches were 
turned in opposite directions. It was not 
feasible to use a balanced sequence of the 
different tests to control practice effects. In- 
stead, for this preliminary study, the twenty 
different tests were administered to all sub- 
jects in the same sequence. 

A total of 78 college students served as 
subjects in the study. All 20 tests described 
above were given to each subject. Each test 
required approximately one minute to ad- 
minister. The subject was instructed to 
turn the switches on the panel in the pre 
defined patterns as rapidly as possible and 
at the same time to be careful to position each 
switch accurately. When a new general pat- 
tern of motion was introduced, the subject 
was given a practice trial on the first ten 
switches to be turned in this pattern of mo 
tion. Reliability figures and_ intercorrela 
tions between components and dimensions of 
motion were computed not only for each in- 
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and for the left hand in 78 subjects. 


are slightly skewed positively. 


dividual test but also for combined scores of 
the various tests involving a common dimen- 
sion of motion. Of the 78 subjects, 49 re- 
peated the sequence of tests some 10 to 14 
days after the initial administration. Data 
obtained on these subjects are used to com- 
pute the test-retest reliability of the differ- 
ent measures obtained on the Reactometer. 


NO. OF SUBJECTS 
~ ts nm nm 
Oo on (e) oO 


oO 





80 9390 


IN SECONDS 


10oO0-—=—«t10 


Distributions of manipulation and travel scores in the simple manipulation pattern for the right 
The distribution of scores for manipulation by the two hands are 
identical, whereas the travel scores show some discrepancy between the hands. 


Both pairs of distributions 


Results 


Typical distributions of test scores are 
presented in Figures 2 and 3. The distribu- 
tions for the right and left hands shown in 
Figure 2 are based on combined scores for 
all tests involving simple manipulation pat- 
terns. There were 8 of these tests for each 
of the hands. Figure 3 shows the analogous 
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appears also in these complex patterns. 


Distribution of scores for 78 subjects in the complex manipulation patterns for travel and ma- 
The pattern of scores found for the simple manipulation patterns 
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Table 1 











First Test Second Test 
Manipulation Travel Manipulation 


Travel 


M o M a M o M 


68.5 17.4 42.2 9.2 58.1 13.1 
Ver. Plane 14.2 40.2 13.0 


Right Hand 63.4 15.0 40.2 8.9 54. 12.7 
Left Hand 64.2 16.0 42.2 8.6 56. 13.2 


Lat. Direction 15.7 43.9 9.4 , 12.5 
Ver. Direction 62.3 15.3 38.5 8.1 55. 13.1 


Manip. Right 03.9 15.4 41.0 
Manip. Left 63.8 15.3 41.3 8.4 


Total Simple 127.7 30.5 82.3 17.1 
Total Complex 36.1 8.6 20.5 4.2 


Table 2 


Test-Retest Reliability of the Component Tests with Respect to Both 
Manipulation and Travel Scores 


Note: Each test was of one minute duration. 


Manipulation 


I. Simple Manipulation 
A. Horizontal Plane 
1. R. H., R. Manip., Trav. 
.H., L. Manip., Trav. 
.. H., R. Manip., Trav. 
.. H., L. Manip., Trav. R. 
. H., R. Manip., Trav. In 
.H., L. Manip., Trav. In 
.. H., R. Manip., Trav. In 
. H., L. Manip., Trav. In 
B. Vertical Plane 
9. R.H., R. Manip., Trav. Right 
10. R. H., L. Manip., Trav. Right 
11. L. H., R. Manip., Trav. Right 
12. L. H., L. Manip., Trav. Right 
13. R. H., R. Manip., Trav. Down 
14. R. H., L. Manip., Trav. Down 
15. L. H., R. Manip., Trav. Down 
16. L. H., L. Manip., Trav. Down 


SOW WN 


a 


Il. Complex Manipulation 
A. Vertical Plane 
. R. H., R-L Manip., Trav. Right 
. L. H., R-L Manip., Trav. Right 
. R. H., R-L Manip., Trav. Down 
20. L. H., R-L Manip., Trav. Down 








Shelby Harris and Karl U. Smith 


Table 3 


Test-Retest Reliability for Various Combined Scores 





Manipu- 


lation Travel 


Right Hand 81 75 
Left Hand 87 a7 
Right Manipulation 85 
Left Manipulation 84 
Lateral Travel 83 
Down and In Travel 85 
Horizontal Plane 81 
Vertical Plane 

Total Simple Manipulation 

Total Complex Manipulation 86 


~~ & Os 
suwunrd oS = =~ 


~~ 


> 
v=) 


distributions for combined scores of four tests 
involving complex manipulation patterns. It 
may be seen from these distributions that 
both the manipulation time and travel time 
distributions are similar for the two hands. 
All of the distributions approach normality. 

The test and retest means and standard 
deviations for various combined scores are 
given in Table 1. Each of the combined 
scores is based on the performance of 49 sub- 
jects for all of the tests which involved the 
specified dimension. With the exception of 
the compound score of all tests involving 
complex manipulation, all of these figures 


are based on the tests involving simple 
manipulation patterns. Comparison of the 
means for various dimensions of motion is 
not justified due to the lack of control over 
practice effects. 

Tables 2 and 3 present the test-retest re- 
liability figures for the twenty individual tests 
and for the various combined scores. The 
reliability figures for the individual tests are 
presented in the order that the tests were ad- 
ministered. The combined scores on which 
the reliability figures in Table 3 are based 
are the same as those of Table 1. All of the 
reliability values are relatively high. The 
manipulation-time coefficients are consistently 
higher than the travel-time values. 

Table 4 shows the correlations between the 
manipulation and travel components of mo- 
tion for the combined scores and the correla- 
tions between the several dimensions of mo- 
tion involved in the study. All of these 
figures, which are based on data of 78 sub- 
jects, are positive coefficients. It is obvious 
from the table that the relationship between 
the components of motion is consistently 
low. Nine of these coefficients are signifi- 
cantly different from zero at the one per cent 
level. One is significant at the five } er cent 
level. The correlations between dimensions 
of motion are high for both manipulation and 


Table 4 


Correlations Between Components and Dimensions of Motion 


Right Hand 

Left Hand 

Right Manipulation 
Left Manipulation 


Lateral Travel 
Down and In Travel 


Horizontal Plane 
Vertical Plane 
Total Simple Manipulation 


Total Complex Manipulation 


Man. vs. Trav. 


Correlation between Dimensions 





Manipulation Travel 
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31°° 90" 


a” 
34” 


35+* acai 


.o4** 
Al** 


29** 92° % 


.96** 


a 
.29** 


gS5** 


3G6”* 
a” 


* Significant at 5% level. 
** Significant at 1% level. 
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travel components of motion. Among the 
correlations between dimensions, the values 
for planes of motion are somewhat lower than 
those for other dimensions. Generally, the 
intercorrelations between dimensions of mo- 
tion are higher for the manipulative aspects 
of motion than those for the travel com- 
ponents. 


Summary 


A special psychomotor test for separate 
measurement of the travel and manipulation 
components of motion has been described. 
The test, called the Analytic Reactometer, 
permits controlled variation and measure- 
ment of different bodily and space dimen- 
sions of motion which are involved in various 
types of motion patterns. 

Preliminary investigation employing the in- 
strument have yielded the following general 
results: 

1. Critical sources of variation in perform- 
ance in various motion patterns of the type 
studied are related to the manipulation and 
travel components of motion. 

2. Performances in different space dimen- 
sions of both manipulation and travel move- 
ments correlate highly with one another. 

3. The reliability of specific tests related 
to hands, planes, direction of travel, direc- 
tion of manipulation and complexity of the 
manipulation pattern in the general test situa- 


tion described typically exceeds + .80 for 
manipulation and + .75 for travel 
ments. 

4. The present test, and the principles be- 
hind it, provide one means of securing pre- 
cise and analytical data for exact quanti- 
tative specification of motions and motion 
functions. Application of analytical methods 
described to studies of growth, aging, neuro- 
logical deficiency, and to industrial selec- 
tion may advance considerably the scientific 
validity of data concerning human motion. 


Received June 13, 1952. 
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Applied Psychology in Action 


Editor’s Note: With this issue, we begin 
what may become a regular feature of the 
Journal of Applied Psychology. We plan to 
publish brief descriptions of applied psy- 
chology in action to be written by psycholo- 
gists who are applying psychology in real 
life situations. Brief news notes concerning 
applied psychology in action from a variety 
of sources will be published. Descriptions 
of procedures and techniques believed to be 
effective, even though desirable experimental 
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controls may not have been possible, will be 
included. Thus, a forum for the interchange 
of practical information will be provided 
practitioners of applied psychology. In part, 
this new feature of the Journal of Applied 
Psychology attempts to meet the challenge 
contained in Dr. Marion A. Bills’ presidential 
address before the Division of Industrial and 
Business Psychology last September. It is 
appropriate, therefore, to begin with the publi- 
cation of her provocative address. 


Our Expanding Responsibilities * 


Marion A. Bills 


Aetna Life Insurance Company, Hartford, Connecticut 


Three items lead me to the choice of the 
title for this talk: (1) the most interesting 
diaries which many of the psychologists who 
are working full time in industry have kept 
for two weeks and sent in as a foundation 
for a case book in industrial psychology; 
(2) a meeting which I attended of psychia- 
trists and psychologists working in industry 
which was held in Asbury Park this spring; 
and, (3) the criticisms which have been 
made, some in writing and many in dis- 
cussions, that our published research is at 
a very superficial level. 

The diaries which we have received from 
individuals in private industry indicate clearly 
that our duties spread over the entire field of 
management. There was about an even di- 
vision between duties which are involved in 
the setting of policies and those which have 
to do with the administration of those policies. 

Some of the diaries indicated concentra- 
tion of effort in a given field. Almost the 
full two weeks of one psychologist’s time was 
spent on where to locate a new plant with 
emphasis on labor procurement. The diary 

* Presidential address delivered before the Division 


of Industrial and Business Psychology at the 1952 
APA meeting in Washington, D. C 


ended, “If I sent you one three months from 
now it would be entirely different.” With 
Wage Stabilization still in force how to keep 
within the law and still run a business is 
occupying 70 per cent of one psychologist’s 
time in a mid-western company. One per- 
son delayed sending in his diary until union 
negotiations were over because he had done 
nothing “psychological” (this is a direct quo- 
tation) for the month he had been handling 
the negotations. 

Many of the diaries varied from day to 
day. They included conferences (we seem 
to run to conferences) on wage systems, in- 
cluding merit rating, conferences on train- 
ing ranging from salesmen to supervisors to 
hourly workers and including such detailed 
items as the purchase of an opaque projector 
for use in safety training—discussions on the 
editorial policy of a house organ—confer- 
ences on pension systems, and how to pre- 
pare the individual for retirement and some 
actual work with the individuals—attendance 
at a meeting on a proposed Stock Purchase 
Plan and so through the entire range of man- 
agement. Throughout most of the diaries 
was an occasional hour or two spent direct- 
ing or actually doing work on research prob- 
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lems and one sensed that in many instances 
there was a desire to do more—time only 
being lacking. One of the diaries ended with 
an hour devoted to consulting with one of his 
assistants on a research problem of selec- 
tion and then honesty prevailing he added a 
note, “This hour is really wishful thinking; 
it was only 15 minutes.” 

The great volume of managerial work that 
we do and which was clearly brought out in 
the diaries was pointed up for me at the 
meeting of the psychiatrists and psycholo- 
gists at Asbury Park. You can count on the 
fingers of one hand the number of psychia- 
trists in private industry and of these few, 
only one was talking of management prob- 
lems and he apologized for his interest. The 
psychiatrists, whether they be dealing with 
charwomen or the president of a company, 
were all talking of individuals as individuals. 
Our interest in groups and in organizations 
was entirely lacking among them. This is a 


ball which we are apparently carrying alone. 

How and why have we gotten ouselves into 
this situation for we are in it much more than 
doctors, lawyers or engineers. 


First, I be- 
lieve because we are a newer science and our 
field is much less defined. A problem must 
have at least a medical tinge before manage- 
ment goes to a medical department but psy- 
chology being a bit vague in the mind of 
management they feel free to turn to a psy- 
chologist on almost any problem. Second, 
because by and large we have felt rather com- 
plimented to get into many managerial func- 
tions and have taken them on_ willingly. 
Third, I believe the most important is that 
as we go into managerial work we carry with 
us many fundamental psychological princi- 
ples, and so influence management in the way 
that as psychologists we feel they should be 
influenced and our influence is greater be- 
cause we do not wear a tag which says “psy- 
chologists.” What are these principles that 
we carry over? I believe one of the most im- 
portant is the principle of “Stop, look and 
listen” that as scientists has been ground 
into us in all of our training. Management 
has long been accustomed to getting the facts 
on financial problems, on machine operation, 
on costs in factory upkeep, etc., but many of 
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their judgments on people have been on a 
random basis of single cases—rumor and 
prejudices. I remember 30 years ago one 
man who had built up a big business selling 
office machines told me that all black-haired 
men were dishonest; at that point no amount 
of pointing out honest black-haired men had 
any effect on his prejudice and yet almost 
any one of us given a year or two and some 
tact could have worn him down and at least 
improved his evaluation of personnel. In 
any company it is‘a long selling job that per- 
sons’ reactions can be studied on a scientific 
basis—that persons can be selected for any 
job with a fairly accurate prediction of suc- 
cess or failure—that they will react in certain 
ways to certain types of training—that what 
they want can be determined—that fair wage 
rates can be established that will take into 
consideration the difficulty of the job and the 
efficiency of the individual and will cause at 
least 50 per cent of the personnel to say the 
company is fair. One has only to make one 
or two sales of this type and they may at the 
beginning take a long time until one becomes 
a part of management. This is what I think 
has happened to us. We have made the 
sales. As management has grown to realize 
that their personnel is their chief asset, the 
person that can tell them about that person- 
nel has been drawn into decision making 
functions. 

It’s a long selling job because one must not 
only convince top management who probaly 
were already favorable before we were hired, 
but we must sell the idea all the way down 
the line that the scientific approach is going 
to make each person’s work more effective and 
take not a whit away from his own responsi- 
bilities. 

We have learned a great deal over the 
years; perhaps more than we have given and 
much more than we realize, but the final re- 
sult has been beneficial to both management 
and ourselves. Let us give an example. Our 
first study of the interview was a debunking 
of it. We showed very successfully that the 
average interview was a very weak tool for 
selection of personnel. For example—you all 
remember those experiments by Hollingworth, 
where 10 sales managers interviewed 20 men 
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and if each picked the two that he considered 
best, 18 would have been chosen. But where 
did this most interesting scientific experiment 
get us? Practically nowhere! As psycholo- 
gists we got excited but sales managers said, 
“How interesting” and went right on picking 
salesmen by interviews only. Then, grad- 
ually we modified the approach. In substance 
we said, “The interview is the tool by which 
the final decision must be made but what 
information concerning the individual can 
we give you—the sales manager—that will 
help in this final decision?” With research 
we showed that some tests were helpful—that 
there were certain ways of scoring an applica- 
tion blank that came out with an indication 
of success or lack of success. Management 
bought the results and we had learned to 
play on the team and playing on the team 
we could gradually make suggestions which 
changed somewhat the type of interview and 
helped to make it as an interview more suc- 
cessful. 

We have a mighty heritage of at least 
seventy-five years of psychological research 
back of us to which is constantly being added 
new and valuable data and ideas. Much of 
it is written down in our literature. Some 
has been handed to us by word of mouth 
from our professors and colleagues. It is well 
worth using and we are using it but the cri- 
ticism that our publications as psychologists 
in private industry have been too few and 
too superficial is probably just. 

For example; for at least three years many 
of us have been worrying about the frustrated 
foreman. I think we have done something 
about it in our own companies both on the 
policy setting level and with the individual 
foreman or supervisor, but it’s the academic 
man who writes about it. We do not par- 
ticularly like what he writes—he puts the 
blame too much on the foreman and seeing 
the many complications that the foreman is 
meeting, and because we like him as a friend, 
we become a bit resentful. We talk about 
the Ivory Tower, and we may even quote 
Kipling about “the butterfly along the road 
preaching contentment to the toad.” But, 
we do not write about the frustrated foreman 
as we see him. We shake our heads, and 
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say the subject is too big or too hot, and 
what we write, if we write at all, is a small 
statistical study of how to select or rate the 
bench worker or the file clerk. Of course, 
being a psychologist I now modify my state- 
ment and say that there are exceptions, and 
some of these exceptions are outstanding. 
However, on my desk as I wrote this was the 
December, 1951 Journal of Applied Psy- 
chology, and the Winter Number of Personnel 
Psychology and with the everlasting compul- 
sion of a psychologist to count (we all seem 
to have this compulsion), I counted the 
articles in these two magazines. A total of 
14 articles were from persons connected with 
colleges—5 from the military force—2 from 
consulting firms—and one made up of two 
junior authorships from persons in private 
industry. I think this is fairly typical and 
certainly our showing is not good, and our 
friends in the consulting field although they 
did twice as well as we did, cannot pat them- 
selves on the back too much either. Together 
we contributed only a sixth of the articles in 
the two magazines where you would expect 
us to make the best showing. 

Why then the big difference between what 
we are doing and what we are reporting? 

There are many reasons but may I illus- 
trate a few and I am asking you to bear with 
me while I quote an experience of my own 
so far back that it no longer has a personal 
connotation. Twenty-five years ago we had an 
experiment on sound-proofing, by installing a 
sound-proof ceiling in a department for which 
we had good production records, and could 
continue those records after the installation. 
Based on the results we spent a half million 
dollars sound-proofing a new building. I am 
fully convinced that the decision was correct 
but I never published the results. The most 
amateur statistician could have shot them 
full of holes; a fear complex partly, but also 
a recognition that some results cannot be 
accurately measured. As one talks to psy- 
chologists in private industry one hears this 
often. I know one psychologist who has set 
up a new training program for supervisors. 
He is sure it is successful but he has no 
measured results. In discussing it he said, 
“If I had only thought to count the number 
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of frowns I got in the department before the 
training, and the number of smiles I get now, 
maybe the data would be statistically valid 
at at least the 5 per cent level.” At heart 
we are still strictly scientific. Practice has 
forced us to make decisions on bases which 
cannot be scientifically proven. We have 
learned that a workable solution on time is 
worth more than a perfect one too late, so 
we don’t publish. 

We are not alone in this dilemma. Dr. 
Cameron of the National Industrial Health 
Service, at the meeting of Psychiatrists and 
Psychologists at Asbury Park told us of 
many health projects set up by industry, and 
pleaded with us for some way of measuring 
their success. We cannot usually in industry 
set up experimental controls. For example, 
Dr. Cameron in talking to me said, “You 
have a visiting nurse? Does she pay for 
herself?” I am sure she does but I can’t 
prove it. Of course under laboratory condi- 
tions we could set up controls; we could give 
nurses’ services to half of our office force and 
not to the other half. Then we could keep 
track of absenteeism, turnover and even make 


morale surveys for the two halves, and per- 
haps come out with proof but can you see 
any company being willing to set up such a 


program? I certainly cannot see myself ask- 
ing, much less advising the Aetna to go to 
any such measures to prove something, which 
we already think we know. 

Perhaps we are still too conscious of our 
heritage that any idea to be worth publish- 
ing must represent research and valid proof. 
Perhaps knowing the complications of hu- 
man behavior we become so involved when 
we try to write in general terms and for popu- 
lar consumption we put in so many “ifs and 
buts” that we get discouraged and leave the 
writing to the nonpsychologists who can go 
all out for a given plan and forget the com- 
plications of which we are so conscious. Per- 
haps since we do not need to publish to ad- 
vance in our work, and since we are fairly 
busy, we get a little lazy and do not take 
the time and energy to clarify our thinking 
and put it down on paper for others to read 
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and maybe profit by. 
—we do not publish. 

In talking with our Medical Department 
they tell me that they have two types of 
journals—one of a strictly scientific nature 
where 80 per cent at least of the contributions 
come from research centers, and another type 
where the contributions are mostly from prac- 
ticing physicians, and maybe reports of single 
cases, or small groups of cases. No one ex- 
pects them to be valid research articles but 
they are often very suggestive. 

Is this our solution? Perhaps, but before 
it can work we must change some of our psy- 
chological thinking. It’s a very thin and 
sometimes wavy line between the funda- 
mental concepts that form at least a part of 
our contribution to management and our ac- 
ceptance of the less rigid principles of proof 
that prevail in the management field. Would 
our publications in this less rigid field add 
anything to the fundamental knowledge of 
psychology? Perhaps the trees are so thick 
that we do not see the forest and perhaps we 
have got to wait until some of our colleagues 
who have been through the experience retire, 
and getting at a distance, which gives them 
an objective viewpoint, become our spokes- 
men. Perhaps our real function is that of a 
liaison officer between our experimental work- 
ers and management under which function 
our chief duty would be to keep very well 
informed on both sides, and display the in- 
genuity to connect them, even when in many 
cases the connection is far from obvious. 
I know of one case where a strictly experi- 
mental study by Berth and Rabinowitz on 
the two cord problem helped to set up a 
change in a sales training course for sales- 
men. 

I realize that this talk has been full of 
“perhaps,” which means that questions have 
been raised, and no conclusions reached, but 
psychologists in private industry are only 
about 100 strong and we need the advice of 
our consulting friends and especially of our 
academic ones to help us see clearly where 
our greatest contribution to a young but 
fast growing science lies. 


But the fact remains, 
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Calling in Psychologists Early 


A few years ago, military men found that 
a lot of new weapons were getting too com- 
plicated for the men who had to operate them. 
Industry found much the same thing with 
new plant equipment. Electronic and me- 
chanical devices had been added so fast that 
the human mind could not keep up. Ap- 
plied psychologists were called in to “hu- 
manize” the machines. 

Shortly after the end of World War II, the 
psychologists were turned loose on the nearly 
complete designs for new machines to be used 
for military production. At that point, about 
all the psychologists could do was change 
dials for easier reading, color or illumination 
for less eye strain, size or shape of knobs 
and wheels for easier identification, and a 
few other minor things that would not hurt 


the over-all engineering. Obviously this 
helped some, but it was not enough. Private 
industry was even slower to tackle the prob- 
lem, largely because of the expense involved 
in re-designing machines. 

To remedy this, consultant firms, such as 
Dunlap & Associates, Inc., of Stamford, 
Conn., are campaigning for a place in the 
early design stages of equipment develop- 
ment. Most government groups favor this 
new approach, but industry, in general, is 
skeptical. Industry seems to agree that 
more consideration should be given to the 
human factors early in the design process. 
But it does not think that design engineers 
are going to be happy about psychologists 
butting into the blueprint phase of the prob- 
lem. (Business Week, December 20, 1952.) 
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Wolfle, D., Buxton, C. E., Cofer, C. N., 
Gustad, J. W., MacLeod, R. B., and Mc- 
Keachie, W. T. Improving undergraduate 
instruction in psychology. New York: 
Macmillan, 1952. Pp. vii+ 60. $1.10. 


Surely it is now even more true than when 
H. G. Wells made the statement, that there 
is a race between education and catastrophe. 
The committee making this report is able, 
headed by a man who was probably closer 
than any other to the work of psycholo- 
gists during the war and in the immedi- 
ate post-war period. From this group might 
therefore be expected a program having as a 
major feature, a broad and stimulating view 
of the vital importance of psychology in the 
present-day world. 

But national and world affairs seem not 
even mentioned in the volume. One would 
never know there had been a world war! The 
first chapter, on objectives, emphasizes “the 
contribution which psychology can make to 
a liberal college education,” but the concept 
of such education seems formal and remote 
from the current scene. Nor should the 
major objective of first work in psychology 
be to foster students’ “personal growth and 


increased ability to meet personal and so- 
cial adjustment problems adequately.” A 
four-page chapter on “Personal Adjustment 
Courses” declares scornfully that “it is no 
more justified to consider such a course as a 
course in psychology than it would be to 


substitute . . . a course on household re- 
pairs for introductory physics” (p. 41). And 
courses which “deal with special interest 
areas or purport to provide technical train- 
ing” are only tolerated; there is admiration 
for “a few conscientious departments, de- 
termined to provide the best possible train- 
ing for students, which have recommended 
that such courses be eliminated, even at the 
risk of decreasing enrollments and displeasing 
other departments” (p. 24). The major 
chapter, on “The Recommended Curricu- 
lum,” urges a first course giving ‘a sys- 
tematic presentation of scientific content” 
followed by core courses on “motivation, per- 
ception, thinking and language, ability,” and 
advanced courses in social psychology, physi- 
ological psychology, etc. A two and a half 


page chapter on “Technical Training in 
Psychology” suggests that after such an 
undergraduate program, “a few months of 
full-time vocationally-oriented training in a 
post-A.B. institute could give the student a 
battery of job skills” (p. 44). And the con- 
cept of “liberal college education” becomes 
fairly clear: a program in which psychology 
need feel no responsibility for world or na- 
tional or community problems, or student wel- 
fare, or vocation—and may smugly go its 
own self-centered way. 

A six-page chapter on “Implementation of 
the Curriculum” points out (for instance) 
that, though the proposed program may bring 
some reduction in number of courses, staff 
can be absorbed by laboratories. A final brief 
chapter on “Research Problems underlying 
the Curriculum” suggests (for example) ap- 
praisal of the first course by number of 
students taking further work in psychology, 
and touches briefly on methods of instruc- 
tion; in spite of its title, the volume deals 
with this last topic only incidentally—is given 
over to emphatic declaration for a systematic 
theoretical undergraduate program, and im- 
patient belittling of alternatives. 

Such a partisan position can be adequately 
appraised only by a balancing comparison 
with alternatives. Presumably an alterna- 
tive report might emphasize that indeed “wars 
begin in the minds of men,” that psycho- 
logical warfare is more powerful than the H- 
bomb, that psycho-social problems are major 
in any nation and any community—and that 
psychologists should courageously do any- 
thing they can to bring general understand- 
ing of these issues. It might be proudly con- 
fident that psychology had much to offer 
students in better understanding themselves 
and their problems, and that such help could 
be a vital part of a broad systematic treat- 
ment of the subject. It might be exhilarated 
by the vocational usefulness of much psy- 
chological material, and find therein enrich- 
ment of its essential subject-matter. Instead 
of a statement which many would have con- 
sidered conservative and ‘professionally in- 
trovert thirty years ago, it might be a docu- 
ment which would give college administrators 
and faculty in other departments a stimu- 
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lating view of psychology as a science rapidly 
advancing and eager to cooperate in efforts 
to build educational programs more fully 
meeting the problems of the present world. 
The one-sidedness and inadequacy of the 
present little volume seems to the reviewer 
to emphasize a need for such an alternative 
document. Lacking it, the hope must be 
that the Education and Training Board may 
take a position more positive and forward- 
looking. 
Sidney L. Pressey 
Ohio State University 


Hirsh, I. J. The measurement of hearing. 
New York: McGraw-Hill Book Co., 1952. 
Pp. ix + 364. $6.00. 


This book is concerned with the informa- 
tion about acoustics, electro-acoustic equip- 
ment, psychology of hearing and related topics 
that is basic to both the clinical and experi- 
mental measurement of various aspects of 
hearing. Written by an experimental psy- 
chologist thoroughly acquainted with psy- 
chophysical methods, the treatise stresses 
psychological contributions. It is designed 
for use as a reference by those engaged in 
measuring and. treating hearing disorders, as 
a text for those preparing for this kind of 
clinical work, and as reference material for 
all those interested in hearing. 

An introduction to psychophysical meas- 
urement is followed by discussion of the 
principles of sound and electricity basic to 
control, production and measurement of audi- 
tory stimuli. These principles are then 
applied to operation of electro-acoustic equip- 
ment. Various kinds of auditory measure- 
ment used in clinical audiometry are dis- 
cussed in relation to clinical procedures. 
Each section on auditory measurement is 
followed by clinical applications and the last 
chapter is devoted entirely to clinical audio- 
metry. 

Applied psychologists will be interested 
mainly in the sections devoted to clinical ap- 
plications. Nevertheless, a clear understand- 
ing of the principles underlying sound clinical 
practice is possible only if the sections on 
experimental findings are consulted. 

Although the author has done an excellent 
job of organization and clear exposition with 
highly technical subject matter, the reader 
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must not expect to absorb the material with- 
out concentrated study. This excellent book 
is a must for anyone preparing to do research 
in the measurement of hearing as well as for 
all those with a serious interest in the field. 
Readers will be especially grateful for the 
completeness of the information that accom- 
panies the figures and for the glossary of 
technical terms. It is probable that the 


book will find its greatest use as a reference 
work on techniques of measurement and clini- 
cal applications. 


Miles A. Tinker 


University of Minnesota 


Campbell, C. M. (Editor). Practical ap- 
plications of democratic administration. 
New York: Harper & Bothers Publishers, 
1952. Pp. 325. $3.00. 


Generally speaking school administrators 
in the United States have voiced their al- 
legience to the broad concepts of democracy. 
How these concepts can be given vehicle in 
schools and school administration is far from 
clear to many superintendents and principals. 

Practical Applications of Democratic Ad- 
ministration represents in condensed form the 
thinking of a dozen scholars on this question. 
Their contribution is predicated upon a sound 
philosophical basis, which turns early to prac- 
tical applications based upon research and 
experience. 

Leadership in educational administration is 
the ribbon which binds the separate con- 
tributions together in a package which should 
be both appealing to, and much sought for 
by, all professional educators. The two chap- 
ters dealing with sociology and psychology 
are strikingly illustrative of the integration of 
broad fields of understanding which demo- 
cratic leadership must draw upon. These 
same two chapters offer to many present-day 
school administrators a challenge for con- 
sidering more realistically than they have 
been accustomed to, the social forces in their 
school communities and the vestiges of au- 
tocracy which individuals and groups have 
inherited. 

Readers are not left with a “so what” 
attitude, because seven chapters follow im- 
mediately and describe in clear expository 
style actual admiuistrative leadership prac- 
tices in a number of school communities. 
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At times it is somewhat difficult to align the 
separate examples with specific prongs of the 
foregoing theory. This is, however, under- 
standable because democracy does not pro- 
pose to “blueprint” practice. Instead the 
stream of democracy is fed by leadership 
from local tributaries each carrying in sus- 
pension particles unique to its own fields of 
origin. 

The role of administrative leadership is not 
easy as pointed out in the concluding chap- 
ters, yet the present volume is filled with the 
necessary materials. Although democratic 
idealism seems to have swept the country, 
the next step suggested for educators is to 
reach consensus on somewhat more specific 
points of both theory and practice. It is un- 
likely that this can be accomplished in a 
setting which is pessimistic. 

The implication is clear that a science of 
human engineering coupled with educational 
statesmanship has germinated and is in seri- 
ous need of cultivation. Neglect in this area 
points to a spotty blighted harvest that will 
represent only a fractional part of the poten- 
tial. Therefore, carefully formulated experi- 
mental programs like those cited in this brief 
volume should become the usual rather than 
the unusual practice of current and future 
educational leadership in our democratic so- 
ciety. To do this will take school superin- 
tendents and principals away from many of 
their present routines and “behind the desk” 
management activities out into the com- 
munity. This will not sell the school short, 
however, because the community will bring 
back to it a richness otherwise unattainable. 

Hugh M. Shafer. 


School of Education, 
University of Pennsylvania 


Dooher, M. J., and Marquis, Vivienne (Eds.). 
The development of executive talent. New 
York: American Management Association, 
1952. Pp. 576. $6.75. ($5.75 to AMA 
members. ) 


For anyone who wants to develop a cen- 
tralized planned economy in the United 
States, this book will offer little of value for 
it is oriented to: “Management’s role in the 
preservation of a free society is putting the 
real meaning of a free society to work within 
the organization for which each individual 
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executive is responsible. The basic objective 
is the development of individuals. . . . The 
basic purpose of management is absolutely 
consistent with that of a free society, and 
the individual manager’s responsibility is to 
work that way.” 

For anyone who has negative reactions to 
“management-minded” research and publica- 
tions or who feels that business management 
as a function in our society is part of our 
reactionary past, this book will offer little of 
interest. For this book is based upon the 
principle of “management as a profession, a 
science, and an art.” It recognizes that 
management has an aggressive, dynamic role 
to play in the major national and interna- 
tional struggle between two ways of life, but 
that the successful performance of this role 
is greatly dependent upon the development 
of capable leadership. The purpose of the 
book, then, is to bring together the produc- 
tive experiences and practices of many or- 
ganizations in their efforts to produce man- 
agement and executive personnel who will 
function most effectively in achieving the 
goals of a free society. 

From this it should not be inferred that the 
book is either political or controversial. The 
main body of the book is divided into nine 
parts, consisting of 50 chapters, contributed 
by 44 authors from business and educational 
organizations. The subjects covered include: 
setting up the program—basic principles and 
practices; organization planning; putting the 
program into action; conference training 
methods; special approaches, techniques and 
programs; getting results from follow-up 
counseling; program evaluation; and trends 
in management development. The remain- 
ing pages, consisting of about one-half of the 
entire volume, are devoted to case study re- 
ports of methods used by such companies as 
Standard Oil (N. J.); United Parcel Serv- 
ice; Sears, Roebuck and Co.; Detroit Edison; 
U.S. Rubber; Westinghouse; and others. In 
addition, there is an extensive bibliography 
of approximately 400 items divided under a 
variety of sub-topics relating to leadership 
and management. 

Most of the chapters offer a combination 
of practical “how to do it” material and dis- 
cussions from the experimental literature. 
Although some academicians may be disap- 
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pointed in a large part of the material, it does 
present a carefully planned compromise ap- 
proach for the practicing businessman and 
the classroom educator. While no step-by- 
step solution to individual problems is offered, 
a pattern of action is noted. In the final 
analysis, the book contains specific, prac- 
tical guidance on all of the problems involved 
at every stage of planning and administra- 
tion—from the analysis of needs, through the 
discovery of latent executive ability, to the 
inventorying, rating and development of 
executive skills. 


C. G. Browne 
Wayne University 


Judd, D. B. Color in business, science, and 
industry. New York: John Wiley & Sons, 
Inc., 1952. Pp. 401. $6.50. 


The measurement and specification of color 
have undergone great advances during the 
past thirty years. This book should prove a 
fruitful venture because, during the period 
of maximum development in colorimetry, the 
author has become an outstanding authority 
and has held a strategic position at the Na- 
tional Bureau of Standards. The scope of 


the book is broad and records everything that 
has appealed to the writer as pertinent or 


interesting in respect to color. Such a treat- 
ment cannot be expected to be complete since 
the thinking of the author is set down as final 
without consideration of alternative facts and 
theories. 

The background of the book is physical 
both in the material presented and in the 
point of view. Other influences have made 
themselves felt, but have not altered the 
treatment. Psychology, for example, is men- 
tioned frequently, but one discovers that it 
refers to “the customer’s angle” rather than 
to available scientific material. Much is said 
about psychophysics, but it is not the classi- 
cal psychophysics of Fechner, Miiller and 
Titchener. Methodological considerations of 
the operations necessary for adequate ob- 
servation are not involved. An implied defi- 
nition is that visual psychophysics is a study 
of radiation modified by the sensitivity of 
the eye. 

The presentation is in three parts without 
further conventional subdivision into chap- 
ters. Part I is a compilation of “basic facts” 
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which have entered into the author’s think- 
ing on the subject. The treatment is unsys- 
tematic. Materials of physiological, physical 
and psychological character are intermingled. 
No distinction is made between data and 
hypotheses and the philosophy is that of 
naive physical realism. Nevertheless, the ex- 
position proceeds from physiology of the eye 
to the tristimulus mixture of colors including 
radiation by the way. It is difficult to assess 
the pertinence of various topics. The reader 
is forced into an item by item evaluation 
which must be confusing to a novice in the 
field. It would have been sufficient for the 
remainder of the book had the author limited 
the introduction to a statement of the tri- 
stimulus hypothesis and the development of 
its colorimetric implications. As the discus- 
sion stands, it is not clear precisely what the 
author himself holds to be the “three color 
hypothesis.” He states that “we have seen 
that normal color vision is tridimensional” 
(p. 67), but whether this derives from “the 
fact that we get three independent kinds of 
information from the cones, light-dark, red- 
green, yellow-blue’”’ (p. 18) or from the hy- 
pothesis that “some of the cones contain 
short-wave absorbing (V) pigment, some con- 
tain a preponderance of long-wave absorbing 
(R) pigment, and some contain a preponder- 
ance of middle-wave absorbing (G) pigment” 
(p. 18) is not indicated. 

In Part II we have the meat of the book 
under the title “Tools and Technics” of 
colorimetry. It occupies some two-thirds of 
the text and gives precise and comprehensive 
information needed to carry out measure- 
ments of color and to specify them in one of 
the alternative systems of notation. The re- 
viewer considers this work one of the out- 
standing presentations of colorimetry, one 
that may very well become the standard ref- 
erence in the field. 

It is regrettable that the author felt the 
necessity to go beyond this field. Business 
and industry will hardly be concerned with 
the physiological hypotheses of vision nor the 
psychophysical techniques of psychology. 
Moreover, no one can expect to speak with 
authority on all subjects. 


Forrest L. Dimmick 


U. S. Naval Medical Research Laboratory, 
New London, Connecticut 
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